Message ID | 20241206015308.3342386-1-kbusch@meta.com (mailing list archive) |
---|---|
Headers | show |
Series | block write streams with nvme fdp | expand |
On Thu, Dec 05, 2024 at 05:52:58PM -0800, Keith Busch wrote: > Changes from v10: > > Merged up to block for-6.14/io_uring, which required some > new attribute handling. > > Not mixing write hints usage with write streams. This effectively > abandons any attempts to use the existing fcntl API for use with > filesystems in this series. > > Exporting the stream's reclaim unit nominal size. > > Christoph Hellwig (5): > fs: add a write stream field to the kiocb > block: add a bi_write_stream field > block: introduce a write_stream_granularity queue limit > block: expose write streams for block device nodes > nvme: add a nvme_get_log_lsi helper > > Keith Busch (5): > io_uring: protection information enhancements > io_uring: add write stream attribute > block: introduce max_write_streams queue limit > nvme: register fdp queue limits > nvme: use fdp streams if write stream is provided I fucked up the format-patch command by ommitting a single patch. The following should have been "PATCH 1/11", but I don't want to resend for just this: commit 9e40f4a4da6d0cef871d1c5daf55cc0497fd9c39 Author: Keith Busch <kbusch@kernel.org> Date: Tue Nov 19 13:16:15 2024 +0100 fs: add write stream information to statx Add new statx field to report the maximum number of write streams supported and the granularity for them. Signed-off-by: Keith Busch <kbusch@kernel.org> [hch: renamed hints to streams, add granularity] Signed-off-by: Christoph Hellwig <hch@lst.de> diff --git a/fs/stat.c b/fs/stat.c index 0870e969a8a0b..00e4598b1ff25 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -729,6 +729,8 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer) tmp.stx_atomic_write_unit_min = stat->atomic_write_unit_min; tmp.stx_atomic_write_unit_max = stat->atomic_write_unit_max; tmp.stx_atomic_write_segments_max = stat->atomic_write_segments_max; + tmp.stx_write_stream_granularity = stat->write_stream_granularity; + tmp.stx_write_stream_max = stat->write_stream_max; return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0; } diff --git a/include/linux/stat.h b/include/linux/stat.h index 3d900c86981c5..36d4dfb291abd 100644 --- a/include/linux/stat.h +++ b/include/linux/stat.h @@ -57,6 +57,8 @@ struct kstat { u32 atomic_write_unit_min; u32 atomic_write_unit_max; u32 atomic_write_segments_max; + u32 write_stream_granularity; + u16 write_stream_max; }; /* These definitions are internal to the kernel for now. Mainly used by nfsd. */ diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h index 887a252864416..547c62a1a3a7c 100644 --- a/include/uapi/linux/stat.h +++ b/include/uapi/linux/stat.h @@ -132,9 +132,11 @@ struct statx { __u32 stx_atomic_write_unit_max; /* Max atomic write unit in bytes */ /* 0xb0 */ __u32 stx_atomic_write_segments_max; /* Max atomic write segment count */ - __u32 __spare1[1]; + __u32 stx_write_stream_granularity; /* 0xb8 */ - __u64 __spare3[9]; /* Spare space for future expansion */ + __u16 stx_write_stream_max; + __u16 __sparse2[3]; + __u64 __spare3[8]; /* Spare space for future expansion */ /* 0x100 */ }; @@ -164,6 +166,7 @@ struct statx { #define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id */ #define STATX_SUBVOL 0x00008000U /* Want/got stx_subvol */ #define STATX_WRITE_ATOMIC 0x00010000U /* Want/got atomic_write_* fields */ +#define STATX_WRITE_STREAM 0x00020000U /* Want/got write_stream_* */ #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */
Note: I skipped back to this because v12 only had the log vs v11. On Thu, Dec 05, 2024 at 05:52:58PM -0800, Keith Busch wrote: > > Not mixing write hints usage with write streams. This effectively > abandons any attempts to use the existing fcntl API for use with > filesystems in this series. That's not true as far as I can tell given that this is basically the architecture from my previous posting. The block code still maps the rw hints into write streams, and file systems can do exactly the same. You just need to talk to the fs maintainers and convince them it's a good thing for their particular file system. Especially for simple file systems that task should not be too hard, even if they might want to set a stream or two aside for fs usage. Similarly a file system can implement the stream based API.
On Mon, Dec 09, 2024 at 01:51:32PM +0100, Christoph Hellwig wrote: > On Thu, Dec 05, 2024 at 05:52:58PM -0800, Keith Busch wrote: > > > > Not mixing write hints usage with write streams. This effectively > > abandons any attempts to use the existing fcntl API for use with > > filesystems in this series. > > That's not true as far as I can tell given that this is basically the > architecture from my previous posting. The block code still maps the > rw hints into write streams, and file systems can do exactly the same. > You just need to talk to the fs maintainers and convince them it's a > good thing for their particular file system. Especially for simple > file systems that task should not be too hard, even if they might want > to set a stream or two aside for fs usage. Similarly a file system > can implement the stream based API. Sorry for my confusing message here. I meant *this series* doesn't attempt to use streams with filesystems (I wasn't considering raw block in the same catagory as a traditional filesystems). I am not abandoning follow on efforst to make use of these elsewhere. I just don't want the open topics to distract from the less controversial parts, and this series doesn't prevent or harm future innovations there, so I think we're pretty well aligned up to this point.
Micron Confidential Hi, I was under the impression that passing write hints via fcntl() on any legacy filesystem stays. The hint is attached to the inode, and the fs simply picks it up from there when sending it down with write related to that inode. Aka per file write hint. I am right? Pierre Micron Confidential > -----Original Message----- > From: Christoph Hellwig <hch@lst.de> > Sent: Monday, December 9, 2024 4:52 AM > To: Keith Busch <kbusch@meta.com> > Cc: axboe@kernel.dk; hch@lst.de; linux-block@vger.kernel.org; linux- > nvme@lists.infradead.org; linux-fsdevel@vger.kernel.org; io- > uring@vger.kernel.org; sagi@grimberg.me; asml.silence@gmail.com; Keith > Busch <kbusch@kernel.org> > Subject: [EXT] Re: [PATCHv11 00/10] block write streams with nvme fdp > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you > recognize the sender and were expecting this message. > > > Note: I skipped back to this because v12 only had the log vs v11. > > On Thu, Dec 05, 2024 at 05:52:58PM -0800, Keith Busch wrote: > > > > Not mixing write hints usage with write streams. This effectively > > abandons any attempts to use the existing fcntl API for use with > > filesystems in this series. > > That's not true as far as I can tell given that this is basically the architecture > from my previous posting. The block code still maps the rw hints into write > streams, and file systems can do exactly the same. > You just need to talk to the fs maintainers and convince them it's a good thing > for their particular file system. Especially for simple file systems that task > should not be too hard, even if they might want to set a stream or two aside > for fs usage. Similarly a file system can implement the stream based API. >
On Mon, Dec 09, 2024 at 05:14:16PM +0000, Pierre Labat wrote: > I was under the impression that passing write hints via fcntl() on any > legacy filesystem stays. The hint is attached to the inode, and the fs > simply picks it up from there when sending it down with write related > to that inode. > Aka per file write hint. > > I am right? Nothing is changing with respect to those write hints as a result of this series, if that's what you mean. The driver hadn't been checking the write hint before, and this patch set continues that pre-existing behavior. For this series, the driver utilizes a new field: "write_stream". Mapping the inode write hint to an FDP stream for other filesystems remains an open topic to follow on later.
Thanks Keith for the clarification. If I got it right, that will be decided later by the filesystem maintainers if they went to convert the write hint assigned to a file via fcntl() into a write_stream that is the one used by the block drivers (for FDP for nvme). Regards, Pierre > -----Original Message----- > From: Keith Busch <kbusch@kernel.org> > Sent: Monday, December 9, 2024 9:25 AM > To: Pierre Labat <plabat@micron.com> > Cc: Christoph Hellwig <hch@lst.de>; Keith Busch <kbusch@meta.com>; > axboe@kernel.dk; linux-block@vger.kernel.org; linux- > nvme@lists.infradead.org; linux-fsdevel@vger.kernel.org; io- > uring@vger.kernel.org; sagi@grimberg.me; asml.silence@gmail.com > Subject: Re: [EXT] Re: [PATCHv11 00/10] block write streams with nvme fdp > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you > recognize the sender and were expecting this message. > > > On Mon, Dec 09, 2024 at 05:14:16PM +0000, Pierre Labat wrote: > > I was under the impression that passing write hints via fcntl() on any > > legacy filesystem stays. The hint is attached to the inode, and the fs > > simply picks it up from there when sending it down with write related > > to that inode. > > Aka per file write hint. > > > > I am right? > > Nothing is changing with respect to those write hints as a result of this series, > if that's what you mean. The driver hadn't been checking the write hint before, > and this patch set continues that pre-existing behavior. For this series, the > driver utilizes a new field: > "write_stream". > > Mapping the inode write hint to an FDP stream for other filesystems remains > an open topic to follow on later.
From: Keith Busch <kbusch@kernel.org> Changes from v10: Merged up to block for-6.14/io_uring, which required some new attribute handling. Not mixing write hints usage with write streams. This effectively abandons any attempts to use the existing fcntl API for use with filesystems in this series. Exporting the stream's reclaim unit nominal size. Christoph Hellwig (5): fs: add a write stream field to the kiocb block: add a bi_write_stream field block: introduce a write_stream_granularity queue limit block: expose write streams for block device nodes nvme: add a nvme_get_log_lsi helper Keith Busch (5): io_uring: protection information enhancements io_uring: add write stream attribute block: introduce max_write_streams queue limit nvme: register fdp queue limits nvme: use fdp streams if write stream is provided Documentation/ABI/stable/sysfs-block | 15 +++ block/bdev.c | 6 + block/bio.c | 2 + block/blk-crypto-fallback.c | 1 + block/blk-merge.c | 4 + block/blk-sysfs.c | 6 + block/bounce.c | 1 + block/fops.c | 23 ++++ drivers/nvme/host/core.c | 160 ++++++++++++++++++++++++++- drivers/nvme/host/nvme.h | 5 + include/linux/blk_types.h | 1 + include/linux/blkdev.h | 16 +++ include/linux/fs.h | 1 + include/linux/nvme.h | 73 ++++++++++++ include/uapi/linux/io_uring.h | 21 +++- io_uring/rw.c | 38 ++++++- 16 files changed, 359 insertions(+), 14 deletions(-)