From patchwork Fri Nov 8 19:36:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13868888 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7E6E366 for ; Fri, 8 Nov 2024 19:54:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731095673; cv=none; b=kPVzXdWUz9uzgBBBtIjb30PxfpxfKXrcTxi5e6iohj4IKdbNmDzck++UC+NE02x6mpcDL1q5AzosoGjUsXuTIzXGHLw7TKWxZk3AdJKRru2wHoDMoWBH6/EjCBBVpsw3V8aisX1NJP8GLUuBoRAVb6FTireN9ztpoCjNGRAu+oM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731095673; c=relaxed/simple; bh=7rsmAZLqnfg5iG0cF30bwwfecPgaDzQkot2fv4NAoLA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Cdeg5lVUjR8LYZKwA/tuP/DUEopge2Mcbzxpd6W/Fn5xDV0tHg+qdTtWnUNL/+WbOdcbiTajiaXAHBebVi4wyIjtpoAedwAk59hyRj6r8QpBIZ7UegGXdc+rt4YB7z2zLaene9J8vi+TxXSpoS9Zz5Qy6qHWDIMHHmQOpZ23c6o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=hGF1m/4D; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="hGF1m/4D" Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4A8JGi9t005101 for ; Fri, 8 Nov 2024 11:54:31 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=YqSGEoSorOIfZ+D3KP3w6kJH+XkPq69QfKBXOE1ZvWs=; b=hGF1m/4DiLkI eYkCs6UqltqrIKqbFsDI02vlpDHQmo+ssekJoq6/9nr0HKilRoQVFlKfpK7/omkM qn3IAp1QHgntPGeZIajqELbMSq0tlqHaNGDClQRTxbwS9EER0k8TgPwJYZR1THaD /Fhy5kcNj+RjhRqJ0P6XiDZI1Ky0qeHX2PxrTft6FpO80DaW/QtwDcO0/spjZ964 g1yYxJ4C3MXm6LE2oeX0rlWq6A0gHDWSbPG02uN0z6h3IZNqqhueFQa70upLyNUf KFQMcuw0Jekp6GWYEpEgapdyKohG6Y0Wog7Wehsk2agVGBBfg/+ZDOoI1v9w5ghL DAgrFVsxkw== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42srnp08je-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 08 Nov 2024 11:54:31 -0800 (PST) Received: from twshared10900.35.frc1.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 8 Nov 2024 19:54:06 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 3C66114E3A02B; Fri, 8 Nov 2024 11:36:58 -0800 (PST) From: Keith Busch To: , , , , , CC: , , , , , Keith Busch , Bart Van Assche Subject: [PATCHv11 1/9] block: use generic u16 for write hints Date: Fri, 8 Nov 2024 11:36:21 -0800 Message-ID: <20241108193629.3817619-2-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241108193629.3817619-1-kbusch@meta.com> References: <20241108193629.3817619-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 0l915u-74PzaquInEDAeyXbd4kYCEHAc X-Proofpoint-ORIG-GUID: 0l915u-74PzaquInEDAeyXbd4kYCEHAc X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_02,2024-10-04_01,2024-09-30_01 From: Keith Busch This is still backwards compatible with lifetime hints. It just doesn't constrain the hints to that definition. Using this type doesn't change the size of either bio or request. Reviewed-by: Bart Van Assche Signed-off-by: Keith Busch --- include/linux/blk-mq.h | 3 +-- include/linux/blk_types.h | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 2035fad3131fb..08ed7b5c4dbbf 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -8,7 +8,6 @@ #include #include #include -#include struct blk_mq_tags; struct blk_flush_queue; @@ -156,7 +155,7 @@ struct request { struct blk_crypto_keyslot *crypt_keyslot; #endif - enum rw_hint write_hint; + unsigned short write_hint; unsigned short ioprio; enum mq_rq_state state; diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index dce7615c35e7e..6737795220e18 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -10,7 +10,6 @@ #include #include #include -#include struct bio_set; struct bio; @@ -219,7 +218,7 @@ struct bio { */ unsigned short bi_flags; /* BIO_* below */ unsigned short bi_ioprio; - enum rw_hint bi_write_hint; + unsigned short bi_write_hint; blk_status_t bi_status; atomic_t __bi_remaining; From patchwork Fri Nov 8 19:36:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13868870 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97E0C1AA1DD for ; Fri, 8 Nov 2024 19:43:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731094985; cv=none; b=WQgJeB2lFIrjAevF5qU9R0JaSAfOR3XOnrwdSeeznW6NhlEyCgQY5oGWzSEJPhewWK2Ihz05A9dq14B8RFQSCwHhl8Cu+xnIaEC5yFnwAfY9szlpQpwi/GAps9lHDuC9eWbOvN1jodhkOJFBmhjxpxPDbe/LDII1UvR/bXKJgsQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731094985; c=relaxed/simple; bh=dghYSJUqrHxZgJ2S/xR790cnSvBl4HY9BwyiDV53ToA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=u3FA5Tq5rIwXqWKxgGoSCCl/UHKAbQFBhshg1c3gP7U/qUdHWXBCpyEzNpb6yAOEWg2Byl33QriPBexHPkAVfSQ/n4lO1aQvAO8jQPo/y7w+0r44GilaDQAagnE2tDGCXq6m9+8XsqrSIQi7/lUrhMEk3+qeN96t841xaw5fLxM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=XHHgGm7d; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="XHHgGm7d" Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4A8IANqN004929 for ; Fri, 8 Nov 2024 11:43:03 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=EBbt1Sz3anE/DwELXc/1XodO6Fo/C1FtndRt7TOnDpo=; b=XHHgGm7d9IZ+ xPIQvIHhw2W7woWQpGp5uLIN1BQ998I7hfP3T3MlEkP1FQNrC72XCYKJam2OOCZA 4wjLdRhkyWxiQhuPnBs6aUvDL9cymh1zu1S49udr66QZ+9c1Pn/F2RHjZRycxt+S a6YtJ36+pTovLHK2Ejbd/0tz7pK8kko0mdw8VDOYZoxQln3lk1iSLC8oAmqIvmVp ENq8gnFwQ796cHoAVVgU3NjHB0NPv7Ckzc9BKVvfG7besnHfeO6S6G3gGqJXl3ms CtbSSbTXO3n5x4MFLrKRaij+U/BWhIT7AeF4xtrvMVPF/mRbWFiPjl/dQHw89/lM QE1Igv+Ifw== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42sqpc0qye-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 08 Nov 2024 11:43:02 -0800 (PST) Received: from twshared29075.03.ash8.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 8 Nov 2024 19:43:00 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 7135514E3A02E; Fri, 8 Nov 2024 11:36:58 -0800 (PST) From: Keith Busch To: , , , , , CC: , , , , , Keith Busch Subject: [PATCHv11 2/9] block: introduce max_write_hints queue limit Date: Fri, 8 Nov 2024 11:36:22 -0800 Message-ID: <20241108193629.3817619-3-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241108193629.3817619-1-kbusch@meta.com> References: <20241108193629.3817619-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 0-Mhu4bJSHzbDba59SMNyNeC82I2ENxE X-Proofpoint-GUID: 0-Mhu4bJSHzbDba59SMNyNeC82I2ENxE X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Keith Busch Drivers with hardware that support write streams need a way to export how many are available so applications can generically query this. Signed-off-by: Keith Busch --- Documentation/ABI/stable/sysfs-block | 7 +++++++ block/blk-settings.c | 3 +++ block/blk-sysfs.c | 3 +++ include/linux/blkdev.h | 12 ++++++++++++ 4 files changed, 25 insertions(+) diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block index 8353611107154..f2db2cabb8e75 100644 --- a/Documentation/ABI/stable/sysfs-block +++ b/Documentation/ABI/stable/sysfs-block @@ -506,6 +506,13 @@ Description: [RO] Maximum size in bytes of a single element in a DMA scatter/gather list. +What: /sys/block//queue/max_write_hints +Date: October 2024 +Contact: linux-block@vger.kernel.org +Description: + [RO] Maximum number of write hints supported, 0 if not + supported. If supported, valid values are 1 through + max_write_hints, inclusive. What: /sys/block//queue/max_segments Date: March 2010 diff --git a/block/blk-settings.c b/block/blk-settings.c index 5ee3d6d1448df..f9f831f104615 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -43,6 +43,7 @@ void blk_set_stacking_limits(struct queue_limits *lim) lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK; /* Inherit limits from component devices */ + lim->max_write_hints = USHRT_MAX; lim->max_segments = USHRT_MAX; lim->max_discard_segments = USHRT_MAX; lim->max_hw_sectors = UINT_MAX; @@ -544,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size); + t->max_write_hints = min(t->max_write_hints, b->max_write_hints); + alignment = queue_limit_alignment_offset(b, start); /* Bottom device has different alignment. Check that it is diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 0ef4e13e247d9..1925ea23bd290 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -104,6 +104,7 @@ QUEUE_SYSFS_LIMIT_SHOW(max_segments) QUEUE_SYSFS_LIMIT_SHOW(max_discard_segments) QUEUE_SYSFS_LIMIT_SHOW(max_integrity_segments) QUEUE_SYSFS_LIMIT_SHOW(max_segment_size) +QUEUE_SYSFS_LIMIT_SHOW(max_write_hints) QUEUE_SYSFS_LIMIT_SHOW(logical_block_size) QUEUE_SYSFS_LIMIT_SHOW(physical_block_size) QUEUE_SYSFS_LIMIT_SHOW(chunk_sectors) @@ -457,6 +458,7 @@ QUEUE_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb"); QUEUE_RO_ENTRY(queue_max_segments, "max_segments"); QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments"); QUEUE_RO_ENTRY(queue_max_segment_size, "max_segment_size"); +QUEUE_RO_ENTRY(queue_max_write_hints, "max_write_hints"); QUEUE_RW_LOAD_MODULE_ENTRY(elv_iosched, "scheduler"); QUEUE_RO_ENTRY(queue_logical_block_size, "logical_block_size"); @@ -591,6 +593,7 @@ static struct attribute *queue_attrs[] = { &queue_max_discard_segments_entry.attr, &queue_max_integrity_segments_entry.attr, &queue_max_segment_size_entry.attr, + &queue_max_write_hints_entry.attr, &queue_hw_sector_size_entry.attr, &queue_logical_block_size_entry.attr, &queue_physical_block_size_entry.attr, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1b51a7c92e9be..1477f751ad8bd 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -394,6 +394,8 @@ struct queue_limits { unsigned short max_integrity_segments; unsigned short max_discard_segments; + unsigned short max_write_hints; + unsigned int max_open_zones; unsigned int max_active_zones; @@ -1198,6 +1200,11 @@ static inline unsigned short queue_max_segments(const struct request_queue *q) return q->limits.max_segments; } +static inline unsigned short queue_max_write_hints(struct request_queue *q) +{ + return q->limits.max_write_hints; +} + static inline unsigned short queue_max_discard_segments(const struct request_queue *q) { return q->limits.max_discard_segments; @@ -1245,6 +1252,11 @@ static inline unsigned int bdev_max_segments(struct block_device *bdev) return queue_max_segments(bdev_get_queue(bdev)); } +static inline unsigned short bdev_max_write_hints(struct block_device *bdev) +{ + return queue_max_write_hints(bdev_get_queue(bdev)); +} + static inline unsigned queue_logical_block_size(const struct request_queue *q) { return q->limits.logical_block_size; From patchwork Fri Nov 8 19:36:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13868883 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38169233D72 for ; Fri, 8 Nov 2024 19:49:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731095349; cv=none; b=FjayqV/WccqND1Z4DsC8h2p00BxEX7SFZUiq1StgUGxTTHf3qSur8nPQfdFC7zTtrDP792NFWTIidm8vXVhvEKOqiA3D1T/jfCDH4e3qgAeV/2lR0TN5IpPr9CaJ8Samkax0FIaqiW3w98BkDkj1oVPij8Gf7Uw3uU3c+5PmZP0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731095349; c=relaxed/simple; bh=bLVs+NUjXqkb64uW7DX9xzl2BkYI8JF3mYXsk/Fl4uY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=oHAP/QLhODtiQDN9h3JyFT7OFrg3twm0VXO4ZnrppMt/isQWge1Uc034h+tNn4XAv/79pvPNefKfHHdwEFPBWX/FhmEjMCBAS6Q4XBn/sGiq0KABAaWXFSpMkmbBgLCqIwnhlZ1KuWHh6VwTWF0tZ+Wb0WQcs/EHCEO1UG0LWoE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=naVfa/97; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="naVfa/97" Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4A8HPd1Q007912 for ; Fri, 8 Nov 2024 11:49:07 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=YaxUJJnRZ28AJAdK6cvper+b/ynFQDqHd29QjSNk8fc=; b=naVfa/97F2RL efmByA/AvA/pL3j+JJmsm4Wr597WLBdbJ50RmC/Lf0AaOcW3vRDzzCrKJcfo6w7S OZHp72Se7FEaabkj160022BRyiZ4JQtsbrVm03lr4q+bT3rHmtlycT81abd9WvKJ 6spJdc/R8j7gCjHhcsUogwonn6eVGdbBxfqQTJrtg+eGuO0tajfoAPdg56ByGJxC Rwi6lpvTF4jPKYXPZJytCN/6MznMSpuk6wQk1dmC0bsRbESRfIV0jQcXJyS0Y/BK gfKb+X5iFSHR5srE3TA1Zdp9dYYPQRkQt8F7WZ6VJjXZt6afqUI86SFq6KsJqarh 5Rjq2oc8Dg== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42sn4ua2fn-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 08 Nov 2024 11:49:07 -0800 (PST) Received: from twshared35181.07.ash9.facebook.com (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 8 Nov 2024 19:49:04 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 8375014E3A032; Fri, 8 Nov 2024 11:36:58 -0800 (PST) From: Keith Busch To: , , , , , CC: , , , , , Keith Busch Subject: [PATCHv11 3/9] statx: add write hint information Date: Fri, 8 Nov 2024 11:36:23 -0800 Message-ID: <20241108193629.3817619-4-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241108193629.3817619-1-kbusch@meta.com> References: <20241108193629.3817619-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: enOvz_wIcujln-gWw2WIO8jPJNsxS22U X-Proofpoint-GUID: enOvz_wIcujln-gWw2WIO8jPJNsxS22U X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_02,2024-10-04_01,2024-09-30_01 From: Keith Busch If requested on a raw block device, report the maximum write hint the block device supports. Signed-off-by: Keith Busch --- block/bdev.c | 5 +++++ fs/stat.c | 1 + include/linux/stat.h | 1 + include/uapi/linux/stat.h | 3 ++- 4 files changed, 9 insertions(+), 1 deletion(-) diff --git a/block/bdev.c b/block/bdev.c index 738e3c8457e7f..9a59f0c882170 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -1296,6 +1296,11 @@ void bdev_statx(struct path *path, struct kstat *stat, stat->result_mask |= STATX_DIOALIGN; } + if (request_mask & STATX_WRITE_HINT) { + stat->write_hint_max = bdev_max_write_hints(bdev); + stat->result_mask |= STATX_WRITE_HINT; + } + if (request_mask & STATX_WRITE_ATOMIC && bdev_can_atomic_write(bdev)) { struct request_queue *bd_queue = bdev->bd_queue; diff --git a/fs/stat.c b/fs/stat.c index 41e598376d7e3..60bcd5c2e2a1d 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -704,6 +704,7 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer) tmp.stx_atomic_write_unit_min = stat->atomic_write_unit_min; tmp.stx_atomic_write_unit_max = stat->atomic_write_unit_max; tmp.stx_atomic_write_segments_max = stat->atomic_write_segments_max; + tmp.stx_write_hint_max = stat->write_hint_max; return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0; } diff --git a/include/linux/stat.h b/include/linux/stat.h index 3d900c86981c5..48f0f64846a02 100644 --- a/include/linux/stat.h +++ b/include/linux/stat.h @@ -57,6 +57,7 @@ struct kstat { u32 atomic_write_unit_min; u32 atomic_write_unit_max; u32 atomic_write_segments_max; + u32 write_hint_max; }; /* These definitions are internal to the kernel for now. Mainly used by nfsd. */ diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h index 887a252864416..10f5622c21113 100644 --- a/include/uapi/linux/stat.h +++ b/include/uapi/linux/stat.h @@ -132,7 +132,7 @@ struct statx { __u32 stx_atomic_write_unit_max; /* Max atomic write unit in bytes */ /* 0xb0 */ __u32 stx_atomic_write_segments_max; /* Max atomic write segment count */ - __u32 __spare1[1]; + __u32 stx_write_hint_max; /* 0xb8 */ __u64 __spare3[9]; /* Spare space for future expansion */ /* 0x100 */ @@ -164,6 +164,7 @@ struct statx { #define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id */ #define STATX_SUBVOL 0x00008000U /* Want/got stx_subvol */ #define STATX_WRITE_ATOMIC 0x00010000U /* Want/got atomic_write_* fields */ +#define STATX_WRITE_HINT 0x00020000U /* Want/got write_hint_max */ #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */ From patchwork Fri Nov 8 19:36:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13868889 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68A09366 for ; Fri, 8 Nov 2024 19:54:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731095678; cv=none; b=SFp9p3gLhxoneA4dE9ZIBaMZCKHJHgY+zIt7rs9fghe5mFxatVhqQtIGNoOKeqYm9Jh+2Wdd9RyxlfNsBq9fWpO746+uzGFsU9pOBs43Fy23IbC1pj1sILvmeUyxeLAisglObRQrKCPGo3Grdb/+ErSZUmE9ZjLtM0a6mLpk2sg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731095678; c=relaxed/simple; bh=kC2CCpjrTQkhcEvumtIX2EYBkb5eklg6AEd9vlsozpE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ATzUMXTLkfOO8lsGMxrNjSMcwZNPkjXnUkJ8CfAyKR6gdxa95kvQYh0h7wZrEzxXJk+C8RCAVvLX2wNtwX2PqpSdjqV3An5cLf3SFxCPtk2bNHF5Bc3Q1qaqseGI7OOb0QIbwf55DT9CcHUpwrAroowrnETDKolJQhK7MeitnFM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=FetIrU/B; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="FetIrU/B" Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4A8IATUH005487 for ; Fri, 8 Nov 2024 11:54:36 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=nFY5cQKcXIBMZ0d4Pwtwa9fvrYAh9lTNwdSR11VK3f8=; b=FetIrU/BBqo+ 6rZNdQeIt9m9VthgVWZmYJpAsg1ONTepD098hioGUUJ5gufGBINr43QIMKbLOoj0 OIRNvzYjZwh9QIsvq2WFul2Ug/JicnrIFYdJ6XPgF0126/7hx5n3DnWCsj4XZgVs kXJh+Rh2AbaymmDzPll0qZuz2+F/fwfNBtQcBgNR2ZBHgPViZ2u5QHohvPsVj4Qk J25Kk8fxTkcpkOIGiIvFfdo2A3mz4EjR1u+BDM/eSQowf0Hmc4ML30iXDMrqBhlW 243D/jFhXKVLN90gzmJ/XpMXY2cgHkSC9blIkzxOA7Ge37ZlI3GvbK0UoA7lcWxa qtpAtLhwDQ== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42sqpc0t7u-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 08 Nov 2024 11:54:35 -0800 (PST) Received: from twshared54778.38.frc1.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c08b:78::c78f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 8 Nov 2024 19:54:04 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 8DC3714E3A034; Fri, 8 Nov 2024 11:36:58 -0800 (PST) From: Keith Busch To: , , , , , CC: , , , , , Keith Busch Subject: [PATCHv11 4/9] block: allow ability to limit partition write hints Date: Fri, 8 Nov 2024 11:36:24 -0800 Message-ID: <20241108193629.3817619-5-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241108193629.3817619-1-kbusch@meta.com> References: <20241108193629.3817619-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: dBRU5zdJfMBfZFNTrnYSW5pLeeL2HFhS X-Proofpoint-GUID: dBRU5zdJfMBfZFNTrnYSW5pLeeL2HFhS X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Keith Busch When multiple partitions are used, you may want to enforce different subsets of the available write hints for each partition. Provide a bitmap attribute of the available write hints, and allow an admin to write a different mask to set the partition's allowed write hints. Signed-off-by: Keith Busch --- Documentation/ABI/stable/sysfs-block | 7 +++++ block/bdev.c | 17 +++++++++++ block/partitions/core.c | 45 ++++++++++++++++++++++++++-- include/linux/blk_types.h | 1 + 4 files changed, 68 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block index f2db2cabb8e75..fa2db6b638d63 100644 --- a/Documentation/ABI/stable/sysfs-block +++ b/Documentation/ABI/stable/sysfs-block @@ -187,6 +187,13 @@ Description: partition is offset from the internal allocation unit's natural alignment. +What: /sys/block///write_hint_mask +Date: October 2024 +Contact: linux-block@vger.kernel.org +Description: + The mask of allowed write hints. You can limit which hints the + block layer will use by writing a new mask. Only the first + partition can access all the write hints by default. What: /sys/block///stat Date: February 2008 diff --git a/block/bdev.c b/block/bdev.c index 9a59f0c882170..e6f9d19db599b 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -415,6 +415,7 @@ void __init bdev_cache_init(void) struct block_device *bdev_alloc(struct gendisk *disk, u8 partno) { struct block_device *bdev; + unsigned short write_hint; struct inode *inode; inode = new_inode(blockdev_superblock); @@ -440,6 +441,22 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno) return NULL; } bdev->bd_disk = disk; + + write_hint = bdev_max_write_hints(bdev); + if (write_hint) { + bdev->write_hint_mask = bitmap_alloc(write_hint, GFP_KERNEL); + if (!bdev->write_hint_mask) { + free_percpu(bdev->bd_stats); + iput(inode); + return NULL; + } + + if (partno == 1) + bitmap_set(bdev->write_hint_mask, 0, write_hint); + else + bitmap_clear(bdev->write_hint_mask, 0, write_hint); + } + return bdev; } diff --git a/block/partitions/core.c b/block/partitions/core.c index 815ed33caa1b8..c71a5d34339d7 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -203,6 +203,41 @@ static ssize_t part_discard_alignment_show(struct device *dev, return sprintf(buf, "%u\n", bdev_discard_alignment(dev_to_bdev(dev))); } +static ssize_t part_write_hint_mask_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct block_device *bdev = dev_to_bdev(dev); + unsigned short max_write_hints = bdev_max_write_hints(bdev); + + if (!max_write_hints) + return sprintf(buf, "0"); + return sprintf(buf, "%*pb\n", max_write_hints, bdev->write_hint_mask); +} + +static ssize_t part_write_hint_mask_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct block_device *bdev = dev_to_bdev(dev); + unsigned short max_write_hints = bdev_max_write_hints(bdev); + unsigned long *new_mask; + + if (!max_write_hints) + return count; + + new_mask = bitmap_alloc(max_write_hints, GFP_KERNEL); + if (!new_mask) + return -ENOMEM; + + bitmap_parse(buf, count, new_mask, max_write_hints); + bitmap_copy(bdev->write_hint_mask, new_mask, max_write_hints); + smp_wmb(); + bitmap_free(new_mask); + + return count; +} + static DEVICE_ATTR(partition, 0444, part_partition_show, NULL); static DEVICE_ATTR(start, 0444, part_start_show, NULL); static DEVICE_ATTR(size, 0444, part_size_show, NULL); @@ -211,6 +246,8 @@ static DEVICE_ATTR(alignment_offset, 0444, part_alignment_offset_show, NULL); static DEVICE_ATTR(discard_alignment, 0444, part_discard_alignment_show, NULL); static DEVICE_ATTR(stat, 0444, part_stat_show, NULL); static DEVICE_ATTR(inflight, 0444, part_inflight_show, NULL); +static DEVICE_ATTR(write_hint_mask, 0644, part_write_hint_mask_show, + part_write_hint_mask_store); #ifdef CONFIG_FAIL_MAKE_REQUEST static struct device_attribute dev_attr_fail = __ATTR(make-it-fail, 0644, part_fail_show, part_fail_store); @@ -225,6 +262,7 @@ static struct attribute *part_attrs[] = { &dev_attr_discard_alignment.attr, &dev_attr_stat.attr, &dev_attr_inflight.attr, + &dev_attr_write_hint_mask.attr, #ifdef CONFIG_FAIL_MAKE_REQUEST &dev_attr_fail.attr, #endif @@ -245,8 +283,11 @@ static const struct attribute_group *part_attr_groups[] = { static void part_release(struct device *dev) { - put_disk(dev_to_bdev(dev)->bd_disk); - bdev_drop(dev_to_bdev(dev)); + struct block_device *part = dev_to_bdev(dev); + + bitmap_free(part->write_hint_mask); + put_disk(part->bd_disk); + bdev_drop(part); } static int part_uevent(const struct device *dev, struct kobj_uevent_env *env) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 6737795220e18..af430e543f7f7 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -73,6 +73,7 @@ struct block_device { #ifdef CONFIG_SECURITY void *bd_security; #endif + unsigned long *write_hint_mask; /* * keep this out-of-line as it's both big and not needed in the fast * path From patchwork Fri Nov 8 19:36:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13868881 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91D231AA1C8 for ; Fri, 8 Nov 2024 19:48:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731095283; cv=none; b=nDZw5+zVsn85Ei+PVakr8RVK1kOJQFEXL/6eVGZQr4uzfgMDroEOmei8RY4uRIvKW8jrxOIvgiIfT6e0uDxGl5oq3+LsbPC/iZJNh0yromnSuvh/mryftcV8VKJIoHXn8hX8GCh8A5tTQw+mScR37ro0QKlrxtgFegSunYwomuU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731095283; c=relaxed/simple; bh=adteBJfyJtY7oDHnJpxwkL2NKubszXL1x9U8MVITZhk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=JfO1LTOD63LuidhU3BGl7vyLiQ+e0EfaE6hGrm2AHQ5i3WMeSfSMtBpaqSTTspMb9pNvgSpdxaebfk6dVi5Gekwk7N+tw9s24EMcDPmkIlmNX8jeBl84BqkNyJrNXYHgTfvbr9HrX74qC6eX1jUhf1oN9Qakbp4VNWSuCD61Veg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=lVzBNxmH; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="lVzBNxmH" Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4A8JGkZc005232 for ; Fri, 8 Nov 2024 11:48:01 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=ECW477S27Idv4N5w7ts4xaQ9/79vy9WE8emc0BOQ/EM=; b=lVzBNxmHUJIY PuQZP5/UlUkeEw9B286pJu/26R4n/5GW7+QQYDwABgF75kDPVl8n+WdwnY7slDNo E02/GUBNhwaSJBehFXaiJOAkIXsubr5Neg5pq+VzK73Uyygo333LOPN/NNTMoA3Y mP+se3dLDzHogNhuujVjyKbV+3zLijDyYr/Geyy2mdS+O5FR1dRcQoMq69xvnVdc 68hQ8Hvls08ndbCOwQmA8l1UZzfuvq4W8byKiO6UxxlVBDVsIKkP8Ia8EWMOYk4l jjxk0i0T21B4mshbXhnklqXrRTUtEV38oaDxVbdukOooJPAI/udULhpmuyynh0xL Wdm2GvyOJw== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42srnp07c4-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 08 Nov 2024 11:48:00 -0800 (PST) Received: from twshared35181.07.ash9.facebook.com (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 8 Nov 2024 19:47:55 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 9FF4414E3A038; Fri, 8 Nov 2024 11:36:58 -0800 (PST) From: Keith Busch To: , , , , , CC: , , , , , Keith Busch Subject: [PATCHv11 5/9] block, fs: add write hint to kiocb Date: Fri, 8 Nov 2024 11:36:25 -0800 Message-ID: <20241108193629.3817619-6-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241108193629.3817619-1-kbusch@meta.com> References: <20241108193629.3817619-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: AxqfdRoTHTjWPfbjBRz227EzDzHupS9C X-Proofpoint-ORIG-GUID: AxqfdRoTHTjWPfbjBRz227EzDzHupS9C X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_02,2024-10-04_01,2024-09-30_01 From: Keith Busch This prepares for sources other than the inode to provide a write hint. The block layer will use it for direct IO if the requested hint is within the block device's allowed hints. The hint field in the kiocb structure fits in an existing 2-byte hole, so its size is not changed. Signed-off-by: Keith Busch --- block/fops.c | 31 ++++++++++++++++++++++++++++--- include/linux/fs.h | 1 + 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/block/fops.c b/block/fops.c index 2d01c90076813..bb3855ee044f0 100644 --- a/block/fops.c +++ b/block/fops.c @@ -71,7 +71,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb, bio_init(&bio, bdev, vecs, nr_pages, dio_bio_write_op(iocb)); } bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT; - bio.bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint; + bio.bi_write_hint = iocb->ki_write_hint; bio.bi_ioprio = iocb->ki_ioprio; if (iocb->ki_flags & IOCB_ATOMIC) bio.bi_opf |= REQ_ATOMIC; @@ -200,7 +200,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, for (;;) { bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT; - bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint; + bio->bi_write_hint = iocb->ki_write_hint; bio->bi_private = dio; bio->bi_end_io = blkdev_bio_end_io; bio->bi_ioprio = iocb->ki_ioprio; @@ -316,7 +316,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, dio->flags = 0; dio->iocb = iocb; bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT; - bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint; + bio->bi_write_hint = iocb->ki_write_hint; bio->bi_end_io = blkdev_bio_end_io_async; bio->bi_ioprio = iocb->ki_ioprio; @@ -362,6 +362,23 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, return -EIOCBQUEUED; } +static int blkdev_write_hint(struct kiocb *iocb, struct block_device *bdev) +{ + u16 hint = iocb->ki_write_hint; + + if (!hint) + return file_inode(iocb->ki_filp)->i_write_hint; + + if (hint > bdev_max_write_hints(bdev)) + return -EINVAL; + + if (bdev_is_partition(bdev) && + !test_bit(hint - 1, bdev->write_hint_mask)) + return -EINVAL; + + return hint; +} + static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter) { struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host); @@ -373,6 +390,14 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter) if (blkdev_dio_invalid(bdev, iocb, iter)) return -EINVAL; + if (iov_iter_rw(iter) == WRITE) { + int hint = blkdev_write_hint(iocb, bdev); + + if (hint < 0) + return hint; + iocb->ki_write_hint = hint; + } + nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1); if (likely(nr_pages <= BIO_MAX_VECS)) { if (is_sync_kiocb(iocb)) diff --git a/include/linux/fs.h b/include/linux/fs.h index 4b5cad44a1268..1a00accf412e5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -370,6 +370,7 @@ struct kiocb { void *private; int ki_flags; u16 ki_ioprio; /* See linux/ioprio.h */ + u16 ki_write_hint; union { /* * Only used for async buffered reads, where it denotes the From patchwork Fri Nov 8 19:36:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13868910 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 680BC1F26E6 for ; Fri, 8 Nov 2024 20:05:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731096322; cv=none; b=f6beCFvQdJYwekdGbkJfpVA0cfw3WmNARF09A06Ex27pqGJ4tbYfDJ801pUNCZEqMIW/pRU1j2fL5OoItKvG49aIv+vI/zEiARpEk1sjD1Nbu3XE/UqYXf1bihRMFbdm0RY354TE/rT9E7hLcM97P4G2AwTQ1++kwEOPf5BV/Qo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731096322; c=relaxed/simple; bh=DMSc5tz9jrIHoidVsK/PYPAouaSun5T57ModES7HZis=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cGd0N6w2X8i+nizDu6eccSD5QxvPcK1nGUXg48O+pHQBf7tHYWQ1Ckr+B8mdCTLU3RDc3EVEJlai0mLwCZtyc7rw9wMu8fwf2gQ03w6dr8imTAARLCCIPrDhczE582uU/poO0l/7m63yJeqnOFvjK70/BBPxVxBZ+Fb8ALzptEk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=VPtbd3Y/; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="VPtbd3Y/" Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4A8HPfPT013959 for ; Fri, 8 Nov 2024 12:05:19 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=YZD1lyriaTdTAo7xj8F47G4Z4f/YwLOtay5hq6Zxzmo=; b=VPtbd3Y/0aGh ViYVeslDUPnLVOKyWrbgKeQXHnTtMB7tiSkymLTFgdB3/H3DKeh6mYsqovwbx4FG DkyFRT5s5zbmMjWVdOIBtE3+VsAKdWccOT9u3LsGyQk7jBiSfu3EPSiGbxgL9pET PfUSK5b/ZQ5+mmDkKP/XM8a85Epduu8i8BOt6MxTv6wPwNkvmFw46KbUuDRZMhhz qcu6ACU/VoTR2SX3XB4crF6sGeSMQb8EgDFGBb5sg4fRX7ME+y7FLqkaiOAg+Pgm cSwX4IWwB6kPqkJc5iwxzOjbdTzY3avQJOGhRwj2hTz+Sp45OiEaiy+4vlfek1yS 8lnwhbvKCg== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42sp2t9srg-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 08 Nov 2024 12:05:19 -0800 (PST) Received: from twshared26967.08.ash9.facebook.com (2620:10d:c0a8:1c::11) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 8 Nov 2024 20:04:59 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id A640814E3A03A; Fri, 8 Nov 2024 11:36:58 -0800 (PST) From: Keith Busch To: , , , , , CC: , , , , , Nitesh Shetty , Keith Busch Subject: [PATCHv11 6/9] io_uring: enable per-io hinting capability Date: Fri, 8 Nov 2024 11:36:26 -0800 Message-ID: <20241108193629.3817619-7-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241108193629.3817619-1-kbusch@meta.com> References: <20241108193629.3817619-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: ukwXsvjAlINp_m7R76rZgIJyerjZpLy8 X-Proofpoint-ORIG-GUID: ukwXsvjAlINp_m7R76rZgIJyerjZpLy8 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Kanchan Joshi With F_SET_RW_HINT fcntl, user can set a hint on the file inode, and all the subsequent writes on the file pass that hint value down. This can be limiting for block device as all the writes will be tagged with only one lifetime hint value. Concurrent writes (with different hint values) are hard to manage. Per-IO hinting solves that problem. Allow userspace to pass additional metadata in the SQE. __u16 write_hint; If the hint is provided, filesystems may optionally use it. A filesytem may ignore this field if it does not support per-io hints, or if the value is invalid for its backing storage. Just like the inode hints, requesting values that are not supported by the hardware are not an error. Signed-off-by: Kanchan Joshi Signed-off-by: Nitesh Shetty Signed-off-by: Keith Busch --- include/uapi/linux/io_uring.h | 4 ++++ io_uring/io_uring.c | 2 ++ io_uring/rw.c | 2 +- 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 56cf30b49ef5f..4a6c95c923eb4 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -98,6 +98,10 @@ struct io_uring_sqe { __u64 addr3; __u64 __pad2[1]; }; + struct { + __u64 __pad4[1]; + __u16 write_hint; + }; __u64 optval; /* * If the ring is initialized with IORING_SETUP_SQE128, then diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 076171977d5e3..115af82b9151f 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -4169,6 +4169,8 @@ static int __init io_uring_init(void) BUILD_BUG_SQE_ELEM(46, __u16, __pad3[0]); BUILD_BUG_SQE_ELEM(48, __u64, addr3); BUILD_BUG_SQE_ELEM_SIZE(48, 0, cmd); + BUILD_BUG_SQE_ELEM(48, __u64, __pad4); + BUILD_BUG_SQE_ELEM(56, __u16, write_hint); BUILD_BUG_SQE_ELEM(56, __u64, __pad2); BUILD_BUG_ON(sizeof(struct io_uring_files_update) != diff --git a/io_uring/rw.c b/io_uring/rw.c index 93526a64ccd60..fdab23424f386 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -279,7 +279,7 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, rw->kiocb.ki_ioprio = get_current_ioprio(); } rw->kiocb.dio_complete = NULL; - + rw->kiocb.ki_write_hint = READ_ONCE(sqe->write_hint); rw->addr = READ_ONCE(sqe->addr); rw->len = READ_ONCE(sqe->len); rw->flags = READ_ONCE(sqe->rw_flags); From patchwork Fri Nov 8 19:36:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13868882 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FCED233D72 for ; Fri, 8 Nov 2024 19:48:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731095339; cv=none; b=eHRHueHvxfqnUGUjvTU0iT+U0VyK1NgeG6G/LcypGqR0+Yc0LkMElLx/tdqKiAv+XIXgtBad22INGwKkQp2JLIHMpVcxnLAGZHP4ePerd6Qiq17ZmzmWVbC1zfTwli1soQR4JP4VGVGZ9ElLOh6ZshmB94a4XCl5ZlqZif8RDSw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731095339; c=relaxed/simple; bh=62NxbrkTBvu2ZXfuvt0cM1JrzKX8zfy5nlwp4/oPg+M=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qQnAqB5UHL2vKYCUX6rITBmwQmypPJ1MCqwf6/OafWXGLp+z2OLQS9QxBPtACLM/rBLIy68O43aPRwn5NapHjrSFilJO3qZLQd29PfhxN0fJC+YwDrtDHyILONj5kGlyy5pO33CPZLpfzppftWgksrddXhW6c3PSomXT9251Oo0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=GCRfzGRk; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="GCRfzGRk" Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4A8HPKrB022316 for ; Fri, 8 Nov 2024 11:48:57 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=JLqB0bINPP9TBHWvbswRy3CLboznVSFjr0ykLJ11WFw=; b=GCRfzGRk8Fz7 NWWUK9RaGJV34qXK/bfT4kLefN/JzfpmbDtW2YQ9/9pIOnjGLACNxzuS21SzmIMo VUQ7Iaq/SR/ZV7ihdHa/JDs4PQC3cUKio9eH1S7TyJgR5gV+WZl/Pf/PFbpYbxOx J+Sydl3vFVHjYEr7yOIShdnqVuxUljOOigu1Gw2edelmhuMH4/PF7UAonflFnZoZ U708qZwsXc3GFN/AaM7SylKsYds6+mrXBf8GHxpsSHrgPZyM93oDqSqYvllASXkI 849H/JwFLrJWH0IOlM4NW2zrPT/ATGLPu0wy9shNZ+jFROj8SNwj8RMxWJq+VjuS kSoL6TGGxA== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42sn58a41r-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 08 Nov 2024 11:48:57 -0800 (PST) Received: from twshared8596.05.ash9.facebook.com (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Fri, 8 Nov 2024 19:48:55 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id B281E14E3A03D; Fri, 8 Nov 2024 11:36:58 -0800 (PST) From: Keith Busch To: , , , , , CC: , , , , , Keith Busch Subject: [PATCHv11 7/9] block: export placement hint feature Date: Fri, 8 Nov 2024 11:36:27 -0800 Message-ID: <20241108193629.3817619-8-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241108193629.3817619-1-kbusch@meta.com> References: <20241108193629.3817619-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: cpS2lzRUp-VzcJ8C0JsQFFjn38i-70uw X-Proofpoint-GUID: cpS2lzRUp-VzcJ8C0JsQFFjn38i-70uw X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Keith Busch Add a feature flag for devices that support generic placement hints in write commands. This is in contrast to data lifetime hints. Signed-off-by: Keith Busch --- block/blk-settings.c | 2 ++ block/blk-sysfs.c | 3 +++ include/linux/blkdev.h | 3 +++ 3 files changed, 8 insertions(+) diff --git a/block/blk-settings.c b/block/blk-settings.c index f9f831f104615..b809f31ad84f2 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -518,6 +518,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->features &= ~BLK_FEAT_NOWAIT; if (!(b->features & BLK_FEAT_POLL)) t->features &= ~BLK_FEAT_POLL; + if (!(b->features & BLK_FEAT_PLACEMENT_HINTS)) + t->features &= ~BLK_FEAT_PLACEMENT_HINTS; t->flags |= (b->flags & BLK_FLAG_MISALIGNED); diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 1925ea23bd290..6280c5f89b8b7 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -260,6 +260,7 @@ static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \ QUEUE_SYSFS_FEATURE_SHOW(poll, BLK_FEAT_POLL); QUEUE_SYSFS_FEATURE_SHOW(fua, BLK_FEAT_FUA); QUEUE_SYSFS_FEATURE_SHOW(dax, BLK_FEAT_DAX); +QUEUE_SYSFS_FEATURE_SHOW(placement_hints, BLK_FEAT_PLACEMENT_HINTS); static ssize_t queue_zoned_show(struct gendisk *disk, char *page) { @@ -497,6 +498,7 @@ QUEUE_RW_ENTRY(queue_poll_delay, "io_poll_delay"); QUEUE_RW_ENTRY(queue_wc, "write_cache"); QUEUE_RO_ENTRY(queue_fua, "fua"); QUEUE_RO_ENTRY(queue_dax, "dax"); +QUEUE_RO_ENTRY(queue_placement_hints, "placement_hints"); QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout"); QUEUE_RO_ENTRY(queue_virt_boundary_mask, "virt_boundary_mask"); QUEUE_RO_ENTRY(queue_dma_alignment, "dma_alignment"); @@ -626,6 +628,7 @@ static struct attribute *queue_attrs[] = { &queue_wc_entry.attr, &queue_fua_entry.attr, &queue_dax_entry.attr, + &queue_placement_hints_entry.attr, &queue_poll_delay_entry.attr, &queue_virt_boundary_mask_entry.attr, &queue_dma_alignment_entry.attr, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1477f751ad8bd..2ffe9a3b9dbff 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -333,6 +333,9 @@ typedef unsigned int __bitwise blk_features_t; #define BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE \ ((__force blk_features_t)(1u << 15)) +/* supports generic write placement hints */ +#define BLK_FEAT_PLACEMENT_HINTS ((__force blk_features_t)(1u << 16)) + /* * Flags automatically inherited when stacking limits. */