mbox series

[PATCHv11,00/10] block write streams with nvme fdp

Message ID 20241206015308.3342386-1-kbusch@meta.com (mailing list archive)
Headers show
Series block write streams with nvme fdp | expand

Message

Keith Busch Dec. 6, 2024, 1:52 a.m. UTC
From: Keith Busch <kbusch@kernel.org>

Changes from v10:

 Merged up to block for-6.14/io_uring, which required some
 new attribute handling.

 Not mixing write hints usage with write streams. This effectively
 abandons any attempts to use the existing fcntl API for use with
 filesystems in this series.

 Exporting the stream's reclaim unit nominal size.

Christoph Hellwig (5):
  fs: add a write stream field to the kiocb
  block: add a bi_write_stream field
  block: introduce a write_stream_granularity queue limit
  block: expose write streams for block device nodes
  nvme: add a nvme_get_log_lsi helper

Keith Busch (5):
  io_uring: protection information enhancements
  io_uring: add write stream attribute
  block: introduce max_write_streams queue limit
  nvme: register fdp queue limits
  nvme: use fdp streams if write stream is provided

 Documentation/ABI/stable/sysfs-block |  15 +++
 block/bdev.c                         |   6 +
 block/bio.c                          |   2 +
 block/blk-crypto-fallback.c          |   1 +
 block/blk-merge.c                    |   4 +
 block/blk-sysfs.c                    |   6 +
 block/bounce.c                       |   1 +
 block/fops.c                         |  23 ++++
 drivers/nvme/host/core.c             | 160 ++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h             |   5 +
 include/linux/blk_types.h            |   1 +
 include/linux/blkdev.h               |  16 +++
 include/linux/fs.h                   |   1 +
 include/linux/nvme.h                 |  73 ++++++++++++
 include/uapi/linux/io_uring.h        |  21 +++-
 io_uring/rw.c                        |  38 ++++++-
 16 files changed, 359 insertions(+), 14 deletions(-)

Comments

Keith Busch Dec. 6, 2024, 2:18 a.m. UTC | #1
On Thu, Dec 05, 2024 at 05:52:58PM -0800, Keith Busch wrote:
> Changes from v10:
> 
>  Merged up to block for-6.14/io_uring, which required some
>  new attribute handling.
> 
>  Not mixing write hints usage with write streams. This effectively
>  abandons any attempts to use the existing fcntl API for use with
>  filesystems in this series.
> 
>  Exporting the stream's reclaim unit nominal size.
> 
> Christoph Hellwig (5):
>   fs: add a write stream field to the kiocb
>   block: add a bi_write_stream field
>   block: introduce a write_stream_granularity queue limit
>   block: expose write streams for block device nodes
>   nvme: add a nvme_get_log_lsi helper
> 
> Keith Busch (5):
>   io_uring: protection information enhancements
>   io_uring: add write stream attribute
>   block: introduce max_write_streams queue limit
>   nvme: register fdp queue limits
>   nvme: use fdp streams if write stream is provided

I fucked up the format-patch command by ommitting a single patch. The
following should have been "PATCH 1/11", but I don't want to resend for
just this:

commit 9e40f4a4da6d0cef871d1c5daf55cc0497fd9c39
Author: Keith Busch <kbusch@kernel.org>
Date:   Tue Nov 19 13:16:15 2024 +0100

    fs: add write stream information to statx
    
    Add new statx field to report the maximum number of write streams
    supported and the granularity for them.
    
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    [hch: renamed hints to streams, add granularity]
    Signed-off-by: Christoph Hellwig <hch@lst.de>

diff --git a/fs/stat.c b/fs/stat.c
index 0870e969a8a0b..00e4598b1ff25 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -729,6 +729,8 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
 	tmp.stx_atomic_write_unit_min = stat->atomic_write_unit_min;
 	tmp.stx_atomic_write_unit_max = stat->atomic_write_unit_max;
 	tmp.stx_atomic_write_segments_max = stat->atomic_write_segments_max;
+	tmp.stx_write_stream_granularity = stat->write_stream_granularity;
+	tmp.stx_write_stream_max = stat->write_stream_max;
 
 	return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0;
 }
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 3d900c86981c5..36d4dfb291abd 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -57,6 +57,8 @@ struct kstat {
 	u32		atomic_write_unit_min;
 	u32		atomic_write_unit_max;
 	u32		atomic_write_segments_max;
+	u32		write_stream_granularity;
+	u16		write_stream_max;
 };
 
 /* These definitions are internal to the kernel for now. Mainly used by nfsd. */
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 887a252864416..547c62a1a3a7c 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -132,9 +132,11 @@ struct statx {
 	__u32	stx_atomic_write_unit_max;	/* Max atomic write unit in bytes */
 	/* 0xb0 */
 	__u32   stx_atomic_write_segments_max;	/* Max atomic write segment count */
-	__u32   __spare1[1];
+	__u32   stx_write_stream_granularity;
 	/* 0xb8 */
-	__u64	__spare3[9];	/* Spare space for future expansion */
+	__u16   stx_write_stream_max;
+	__u16	__sparse2[3];
+	__u64	__spare3[8];	/* Spare space for future expansion */
 	/* 0x100 */
 };
 
@@ -164,6 +166,7 @@ struct statx {
 #define STATX_MNT_ID_UNIQUE	0x00004000U	/* Want/got extended stx_mount_id */
 #define STATX_SUBVOL		0x00008000U	/* Want/got stx_subvol */
 #define STATX_WRITE_ATOMIC	0x00010000U	/* Want/got atomic_write_* fields */
+#define STATX_WRITE_STREAM	0x00020000U	/* Want/got write_stream_* */
 
 #define STATX__RESERVED		0x80000000U	/* Reserved for future struct statx expansion */
Christoph Hellwig Dec. 9, 2024, 12:51 p.m. UTC | #2
Note: I skipped back to this because v12 only had the log vs v11.

On Thu, Dec 05, 2024 at 05:52:58PM -0800, Keith Busch wrote:
> 
>  Not mixing write hints usage with write streams. This effectively
>  abandons any attempts to use the existing fcntl API for use with
>  filesystems in this series.

That's not true as far as I can tell given that this is basically the
architecture from my previous posting.  The block code still maps the
rw hints into write streams, and file systems can do exactly the same.
You just need to talk to the fs maintainers and convince them it's a
good thing for their particular file system.  Especially for simple
file systems that task should not be too hard, even if they might want
to set a stream or two aside for fs usage.  Similarly a file system
can implement the stream based API.
Keith Busch Dec. 9, 2024, 3:57 p.m. UTC | #3
On Mon, Dec 09, 2024 at 01:51:32PM +0100, Christoph Hellwig wrote:
> On Thu, Dec 05, 2024 at 05:52:58PM -0800, Keith Busch wrote:
> > 
> >  Not mixing write hints usage with write streams. This effectively
> >  abandons any attempts to use the existing fcntl API for use with
> >  filesystems in this series.
> 
> That's not true as far as I can tell given that this is basically the
> architecture from my previous posting.  The block code still maps the
> rw hints into write streams, and file systems can do exactly the same.
> You just need to talk to the fs maintainers and convince them it's a
> good thing for their particular file system.  Especially for simple
> file systems that task should not be too hard, even if they might want
> to set a stream or two aside for fs usage.  Similarly a file system
> can implement the stream based API.

Sorry for my confusing message here. I meant *this series* doesn't
attempt to use streams with filesystems (I wasn't considering raw block
in the same catagory as a traditional filesystems).

I am not abandoning follow on efforst to make use of these elsewhere. I
just don't want the open topics to distract from the less controversial
parts, and this series doesn't prevent or harm future innovations there,
so I think we're pretty well aligned up to this point.
Pierre Labat Dec. 9, 2024, 5:14 p.m. UTC | #4
Micron Confidential

Hi,

I was under the impression that passing write hints via fcntl() on any legacy filesystem stays. The hint is attached to the inode, and the fs simply picks it up from there when sending it down with write related to that inode.
Aka per file write hint.
I am right?

Pierre


Micron Confidential
> -----Original Message-----
> From: Christoph Hellwig <hch@lst.de>
> Sent: Monday, December 9, 2024 4:52 AM
> To: Keith Busch <kbusch@meta.com>
> Cc: axboe@kernel.dk; hch@lst.de; linux-block@vger.kernel.org; linux-
> nvme@lists.infradead.org; linux-fsdevel@vger.kernel.org; io-
> uring@vger.kernel.org; sagi@grimberg.me; asml.silence@gmail.com; Keith
> Busch <kbusch@kernel.org>
> Subject: [EXT] Re: [PATCHv11 00/10] block write streams with nvme fdp
>
> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you
> recognize the sender and were expecting this message.
>
>
> Note: I skipped back to this because v12 only had the log vs v11.
>
> On Thu, Dec 05, 2024 at 05:52:58PM -0800, Keith Busch wrote:
> >
> >  Not mixing write hints usage with write streams. This effectively
> > abandons any attempts to use the existing fcntl API for use with
> > filesystems in this series.
>
> That's not true as far as I can tell given that this is basically the architecture
> from my previous posting.  The block code still maps the rw hints into write
> streams, and file systems can do exactly the same.
> You just need to talk to the fs maintainers and convince them it's a good thing
> for their particular file system.  Especially for simple file systems that task
> should not be too hard, even if they might want to set a stream or two aside
> for fs usage.  Similarly a file system can implement the stream based API.
>
Keith Busch Dec. 9, 2024, 5:25 p.m. UTC | #5
On Mon, Dec 09, 2024 at 05:14:16PM +0000, Pierre Labat wrote:
> I was under the impression that passing write hints via fcntl() on any
> legacy filesystem stays. The hint is attached to the inode, and the fs
> simply picks it up from there when sending it down with write related
> to that inode.
> Aka per file write hint.
>
> I am right?

Nothing is changing with respect to those write hints as a result of
this series, if that's what you mean. The driver hadn't been checking
the write hint before, and this patch set continues that pre-existing
behavior. For this series, the driver utilizes a new field:
"write_stream".

Mapping the inode write hint to an FDP stream for other filesystems
remains an open topic to follow on later.
Pierre Labat Dec. 9, 2024, 5:35 p.m. UTC | #6
Thanks Keith for the clarification.
If I got it right, that will be decided later by the filesystem maintainers if they went to convert the write hint assigned to a file via fcntl() into a write_stream that is the one used by the block drivers (for FDP for nvme).
Regards,
Pierre

> -----Original Message-----
> From: Keith Busch <kbusch@kernel.org>
> Sent: Monday, December 9, 2024 9:25 AM
> To: Pierre Labat <plabat@micron.com>
> Cc: Christoph Hellwig <hch@lst.de>; Keith Busch <kbusch@meta.com>;
> axboe@kernel.dk; linux-block@vger.kernel.org; linux-
> nvme@lists.infradead.org; linux-fsdevel@vger.kernel.org; io-
> uring@vger.kernel.org; sagi@grimberg.me; asml.silence@gmail.com
> Subject: Re: [EXT] Re: [PATCHv11 00/10] block write streams with nvme fdp
> 
> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you
> recognize the sender and were expecting this message.
> 
> 
> On Mon, Dec 09, 2024 at 05:14:16PM +0000, Pierre Labat wrote:
> > I was under the impression that passing write hints via fcntl() on any
> > legacy filesystem stays. The hint is attached to the inode, and the fs
> > simply picks it up from there when sending it down with write related
> > to that inode.
> > Aka per file write hint.
> >
> > I am right?
> 
> Nothing is changing with respect to those write hints as a result of this series,
> if that's what you mean. The driver hadn't been checking the write hint before,
> and this patch set continues that pre-existing behavior. For this series, the
> driver utilizes a new field:
> "write_stream".
> 
> Mapping the inode write hint to an FDP stream for other filesystems remains
> an open topic to follow on later.