diff mbox series

[2/2] block: remove the per-bio/request write hint

Message ID 20220304175556.407719-2-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [1/2] nvme: remove support or stream based temperature hint | expand

Commit Message

Christoph Hellwig March 4, 2022, 5:55 p.m. UTC
With the NVMe support for this gone, there are no consumers of these hints
left, so remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/bio.c                 |  2 --
 block/blk-crypto-fallback.c |  1 -
 block/blk-merge.c           | 14 --------------
 block/blk-mq-debugfs.c      | 24 ------------------------
 block/blk-mq.c              |  1 -
 block/bounce.c              |  1 -
 block/fops.c                |  3 ---
 drivers/md/raid1.c          |  2 --
 drivers/md/raid5-ppl.c      | 28 +++-------------------------
 drivers/md/raid5.c          |  6 ------
 fs/btrfs/extent_io.c        |  1 -
 fs/buffer.c                 | 13 +++++--------
 fs/direct-io.c              |  3 ---
 fs/ext4/page-io.c           |  5 +----
 fs/f2fs/data.c              |  2 --
 fs/gfs2/lops.c              |  1 -
 fs/iomap/buffered-io.c      |  2 --
 fs/iomap/direct-io.c        |  1 -
 fs/mpage.c                  |  1 -
 fs/zonefs/super.c           |  1 -
 include/linux/blk_types.h   |  1 -
 include/linux/blkdev.h      |  3 ---
 22 files changed, 9 insertions(+), 107 deletions(-)

Comments

Jens Axboe March 4, 2022, 7:24 p.m. UTC | #1
On 3/4/22 10:55 AM, Christoph Hellwig wrote:
> With the NVMe support for this gone, there are no consumers of these hints
> left, so remove them.

Looks good to me, didn't find any missed spots.
Dave Chinner March 4, 2022, 10:12 p.m. UTC | #2
On Fri, Mar 04, 2022 at 06:55:56PM +0100, Christoph Hellwig wrote:
> With the NVMe support for this gone, there are no consumers of these hints
> left, so remove them.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/bio.c                 |  2 --
>  block/blk-crypto-fallback.c |  1 -
>  block/blk-merge.c           | 14 --------------
>  block/blk-mq-debugfs.c      | 24 ------------------------
>  block/blk-mq.c              |  1 -
>  block/bounce.c              |  1 -
>  block/fops.c                |  3 ---
>  drivers/md/raid1.c          |  2 --
>  drivers/md/raid5-ppl.c      | 28 +++-------------------------
>  drivers/md/raid5.c          |  6 ------
>  fs/btrfs/extent_io.c        |  1 -
>  fs/buffer.c                 | 13 +++++--------
>  fs/direct-io.c              |  3 ---
>  fs/ext4/page-io.c           |  5 +----
>  fs/f2fs/data.c              |  2 --
>  fs/gfs2/lops.c              |  1 -
>  fs/iomap/buffered-io.c      |  2 --
>  fs/iomap/direct-io.c        |  1 -
>  fs/mpage.c                  |  1 -
>  fs/zonefs/super.c           |  1 -
>  include/linux/blk_types.h   |  1 -
>  include/linux/blkdev.h      |  3 ---
>  22 files changed, 9 insertions(+), 107 deletions(-)

AFAICT, all the filesystem/IO path passthrough plumbing for hints is
now gone, and no hardware will ever receive hints.  Doesn't this
mean that file_write_hint(), file->f_write_hint and iocb->ki_hint
are now completely unused, too?  That also means the fcntl()s for
F_{G,S}ET_FILE_RW_HINT now have zero effect on IO, so should they be
neutered, too?

AFAICT, this patch leaves just the f2fs allocator usage of
inode->i_rw_hint to select a segment to allocate from as the
remaining consumer of this entire plumbing and user API. Is that
used by applications anywhere, or can that be removed and so the
rest of the infrastructure get removed and the fcntl()s no-op'd or
-EOPNOTSUPP?

Cheers,

Dave.
Christoph Hellwig March 5, 2022, 5:19 a.m. UTC | #3
On Sat, Mar 05, 2022 at 09:12:55AM +1100, Dave Chinner wrote:
> AFAICT, all the filesystem/IO path passthrough plumbing for hints is
> now gone, and no hardware will ever receive hints.  Doesn't this
> mean that file_write_hint(), file->f_write_hint and iocb->ki_hint
> are now completely unused, too?

No, for the reason tha you state below.  f2fs still uses it.

> AFAICT, this patch leaves just the f2fs allocator usage of
> inode->i_rw_hint to select a segment to allocate from as the
> remaining consumer of this entire plumbing and user API. Is that
> used by applications anywhere, or can that be removed and so the
> rest of the infrastructure get removed and the fcntl()s no-op'd or
> -EOPNOTSUPP?

I was told it is used quite heavily in android.
Dave Chinner March 5, 2022, 9:40 p.m. UTC | #4
On Sat, Mar 05, 2022 at 06:19:29AM +0100, Christoph Hellwig wrote:
> On Sat, Mar 05, 2022 at 09:12:55AM +1100, Dave Chinner wrote:
> > AFAICT, all the filesystem/IO path passthrough plumbing for hints is
> > now gone, and no hardware will ever receive hints.  Doesn't this
> > mean that file_write_hint(), file->f_write_hint and iocb->ki_hint
> > are now completely unused, too?
> 
> No, for the reason tha you state below.  f2fs still uses it.

My point is that f2fs uses i_write_hint, not f_write_hint or
ki_hint. IOWs, nothing in the IO path use the iocb or file write
hints anymore because they only ever got used to set the hint for
bios. It's now unused information.

According to the io_uring ppl, setup of unnecessary fields in the
iocb has a measurable cost and they've done work to minimise it in
the past. So if these fields are not actually used by anyone in the
IO path, why should we still pay the cost calling
ki_hint_validate(file_write_hint(file)) when setting up an iocb?

> > AFAICT, this patch leaves just the f2fs allocator usage of
> > inode->i_rw_hint to select a segment to allocate from as the
> > remaining consumer of this entire plumbing and user API. Is that
> > used by applications anywhere, or can that be removed and so the
> > rest of the infrastructure get removed and the fcntl()s no-op'd or
> > -EOPNOTSUPP?
> 
> I was told it is used quite heavily in android.

So it's primarily used by out of tree code? And that after this
patch, there's really no way to test that this API does anything
useful at all?

Cheers,

Dave.
Bart Van Assche March 5, 2022, 11:55 p.m. UTC | #5
On 3/5/22 13:40, Dave Chinner wrote:
> On Sat, Mar 05, 2022 at 06:19:29AM +0100, Christoph Hellwig wrote:
>> On Sat, Mar 05, 2022 at 09:12:55AM +1100, Dave Chinner wrote:
>>> AFAICT, this patch leaves just the f2fs allocator usage of
>>> inode->i_rw_hint to select a segment to allocate from as the
>>> remaining consumer of this entire plumbing and user API. Is that
>>> used by applications anywhere, or can that be removed and so the
>>> rest of the infrastructure get removed and the fcntl()s no-op'd or
>>> -EOPNOTSUPP?
>>
>> I was told it is used quite heavily in android.
> 
> So it's primarily used by out of tree code? And that after this
> patch, there's really no way to test that this API does anything
> useful at all?

Hi Dave,

Android kernel developers follow the "upstream first" policy for core kernel 
code (this means all kernel code other than kernel drivers that implement 
support for the phone SoC). As a result, the Android 13 F2FS implementation is 
very close to the upstream F2FS code. So the statement above about "out of tree 
code" is not correct.

Bart.
Jens Axboe March 6, 2022, 5:11 p.m. UTC | #6
On 3/5/22 2:40 PM, Dave Chinner wrote:
> On Sat, Mar 05, 2022 at 06:19:29AM +0100, Christoph Hellwig wrote:
>> On Sat, Mar 05, 2022 at 09:12:55AM +1100, Dave Chinner wrote:
>>> AFAICT, all the filesystem/IO path passthrough plumbing for hints is
>>> now gone, and no hardware will ever receive hints.  Doesn't this
>>> mean that file_write_hint(), file->f_write_hint and iocb->ki_hint
>>> are now completely unused, too?
>>
>> No, for the reason tha you state below.  f2fs still uses it.
> 
> My point is that f2fs uses i_write_hint, not f_write_hint or
> ki_hint. IOWs, nothing in the IO path use the iocb or file write
> hints anymore because they only ever got used to set the hint for
> bios. It's now unused information.
> 
> According to the io_uring ppl, setup of unnecessary fields in the
> iocb has a measurable cost and they've done work to minimise it in
> the past. So if these fields are not actually used by anyone in the
> IO path, why should we still pay the cost calling
> ki_hint_validate(file_write_hint(file)) when setting up an iocb?

Yes, I think we should kill it. If we retain the inode hint, the f2fs
doesn't need a any changes. And it should be safe to make the per-file
fcntl hints return EINVAL, which they would on older kernels anyway.
Untested, but something like the below.


diff --git a/fs/aio.c b/fs/aio.c
index 4ceba13a7db0..eb0948bb74f1 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1478,7 +1478,6 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb)
 	req->ki_flags = iocb_flags(req->ki_filp);
 	if (iocb->aio_flags & IOCB_FLAG_RESFD)
 		req->ki_flags |= IOCB_EVENTFD;
-	req->ki_hint = ki_hint_validate(file_write_hint(req->ki_filp));
 	if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
 		/*
 		 * If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 753986ea1583..bc7c7a7d9260 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -138,7 +138,6 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 	ki->iocb.ki_filp	= file;
 	ki->iocb.ki_pos		= start_pos + skipped;
 	ki->iocb.ki_flags	= IOCB_DIRECT;
-	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
 	ki->iocb.ki_ioprio	= get_current_ioprio();
 	ki->skipped		= skipped;
 	ki->object		= object;
@@ -313,7 +312,6 @@ static int cachefiles_write(struct netfs_cache_resources *cres,
 	ki->iocb.ki_filp	= file;
 	ki->iocb.ki_pos		= start_pos;
 	ki->iocb.ki_flags	= IOCB_DIRECT | IOCB_WRITE;
-	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
 	ki->iocb.ki_ioprio	= get_current_ioprio();
 	ki->object		= object;
 	ki->inval_counter	= cres->inval_counter;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 3c98ef6af97d..45076c01a2ba 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4479,10 +4479,8 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
 	struct f2fs_inode_info *fi = F2FS_I(inode);
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	const bool do_opu = f2fs_lfs_mode(sbi);
-	const int whint_mode = F2FS_OPTION(sbi).whint_mode;
 	const loff_t pos = iocb->ki_pos;
 	const ssize_t count = iov_iter_count(from);
-	const enum rw_hint hint = iocb->ki_hint;
 	unsigned int dio_flags;
 	struct iomap_dio *dio;
 	ssize_t ret;
@@ -4515,8 +4513,6 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
 		if (do_opu)
 			down_read(&fi->i_gc_rwsem[READ]);
 	}
-	if (whint_mode == WHINT_MODE_OFF)
-		iocb->ki_hint = WRITE_LIFE_NOT_SET;
 
 	/*
 	 * We have to use __iomap_dio_rw() and iomap_dio_complete() instead of
@@ -4539,8 +4535,6 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
 		ret = iomap_dio_complete(dio);
 	}
 
-	if (whint_mode == WHINT_MODE_OFF)
-		iocb->ki_hint = hint;
 	if (do_opu)
 		up_read(&fi->i_gc_rwsem[READ]);
 	up_read(&fi->i_gc_rwsem[WRITE]);
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 9c6c6a3e2de5..e26444e977e1 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -292,21 +292,8 @@ static long fcntl_rw_hint(struct file *file, unsigned int cmd,
 
 	switch (cmd) {
 	case F_GET_FILE_RW_HINT:
-		h = file_write_hint(file);
-		if (copy_to_user(argp, &h, sizeof(*argp)))
-			return -EFAULT;
-		return 0;
 	case F_SET_FILE_RW_HINT:
-		if (copy_from_user(&h, argp, sizeof(h)))
-			return -EFAULT;
-		hint = (enum rw_hint) h;
-		if (!rw_hint_valid(hint))
-			return -EINVAL;
-
-		spin_lock(&file->f_lock);
-		file->f_write_hint = hint;
-		spin_unlock(&file->f_lock);
-		return 0;
+		return -EINVAL;
 	case F_GET_RW_HINT:
 		h = inode->i_write_hint;
 		if (copy_to_user(argp, &h, sizeof(*argp)))
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 23e7f93d3956..02400fd00501 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3790,7 +3790,6 @@ static int io_write_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	if (unlikely(!(req->file->f_mode & FMODE_WRITE)))
 		return -EBADF;
-	req->rw.kiocb.ki_hint = ki_hint_validate(file_write_hint(req->file));
 	return io_prep_rw(req, sqe);
 }
 
diff --git a/fs/open.c b/fs/open.c
index 9ff2f621b760..1315253e0247 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -835,7 +835,6 @@ static int do_dentry_open(struct file *f,
 	     likely(f->f_op->write || f->f_op->write_iter))
 		f->f_mode |= FMODE_CAN_WRITE;
 
-	f->f_write_hint = WRITE_LIFE_NOT_SET;
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
 
 	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e2d892b201b0..a1fc3b41cd82 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -327,7 +327,6 @@ struct kiocb {
 	void (*ki_complete)(struct kiocb *iocb, long ret);
 	void			*private;
 	int			ki_flags;
-	u16			ki_hint;
 	u16			ki_ioprio; /* See linux/ioprio.h */
 	struct wait_page_queue	*ki_waitq; /* for async buffered IO */
 	randomized_struct_fields_end
@@ -967,7 +966,6 @@ struct file {
 	 * Must not be taken from IRQ context.
 	 */
 	spinlock_t		f_lock;
-	enum rw_hint		f_write_hint;
 	atomic_long_t		f_count;
 	unsigned int 		f_flags;
 	fmode_t			f_mode;
@@ -2215,31 +2213,13 @@ static inline bool HAS_UNMAPPED_ID(struct user_namespace *mnt_userns,
 	       !gid_valid(i_gid_into_mnt(mnt_userns, inode));
 }
 
-static inline enum rw_hint file_write_hint(struct file *file)
-{
-	if (file->f_write_hint != WRITE_LIFE_NOT_SET)
-		return file->f_write_hint;
-
-	return file_inode(file)->i_write_hint;
-}
-
 static inline int iocb_flags(struct file *file);
 
-static inline u16 ki_hint_validate(enum rw_hint hint)
-{
-	typeof(((struct kiocb *)0)->ki_hint) max_hint = -1;
-
-	if (hint <= max_hint)
-		return hint;
-	return 0;
-}
-
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
 	*kiocb = (struct kiocb) {
 		.ki_filp = filp,
 		.ki_flags = iocb_flags(filp),
-		.ki_hint = ki_hint_validate(file_write_hint(filp)),
 		.ki_ioprio = get_current_ioprio(),
 	};
 }
@@ -2250,7 +2230,6 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
 	*kiocb = (struct kiocb) {
 		.ki_filp = filp,
 		.ki_flags = kiocb_src->ki_flags,
-		.ki_hint = kiocb_src->ki_hint,
 		.ki_ioprio = kiocb_src->ki_ioprio,
 		.ki_pos = kiocb_src->ki_pos,
 	};
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index f701bb23f83c..1779e133cea0 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -956,12 +956,11 @@ TRACE_EVENT(f2fs_direct_IO_enter,
 		__entry->rw	= rw;
 	),
 
-	TP_printk("dev = (%d,%d), ino = %lu pos = %lld len = %lu ki_flags = %x ki_hint = %x ki_ioprio = %x rw = %d",
+	TP_printk("dev = (%d,%d), ino = %lu pos = %lld len = %lu ki_flags = %x ki_ioprio = %x rw = %d",
 		show_dev_ino(__entry),
 		__entry->iocb->ki_pos,
 		__entry->len,
 		__entry->iocb->ki_flags,
-		__entry->iocb->ki_hint,
 		__entry->iocb->ki_ioprio,
 		__entry->rw)
 );
Christoph Hellwig March 6, 2022, 6:01 p.m. UTC | #7
On Sun, Mar 06, 2022 at 10:11:46AM -0700, Jens Axboe wrote:
> Yes, I think we should kill it. If we retain the inode hint, the f2fs
> doesn't need a any changes. And it should be safe to make the per-file
> fcntl hints return EINVAL, which they would on older kernels anyway.
> Untested, but something like the below.

I've sent this off to the testing farm this morning, but EINVAL might
be even better:

http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/more-hint-removal
Jens Axboe March 6, 2022, 6:06 p.m. UTC | #8
On 3/6/22 11:01 AM, Christoph Hellwig wrote:
> On Sun, Mar 06, 2022 at 10:11:46AM -0700, Jens Axboe wrote:
>> Yes, I think we should kill it. If we retain the inode hint, the f2fs
>> doesn't need a any changes. And it should be safe to make the per-file
>> fcntl hints return EINVAL, which they would on older kernels anyway.
>> Untested, but something like the below.
> 
> I've sent this off to the testing farm this morning, but EINVAL might
> be even better:
> 
> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/more-hint-removal

I do think EINVAL is better, as it just tells the app it's not available
like we would've done before. With just doing zeroes, that might break
applications that set-and-verify. Of course there's also the risk of
that since we retain inode hints (so they work), but fail file hints.
That's a lesser risk though, and we only know of the inode hints being
used.
Dave Chinner March 6, 2022, 11:17 p.m. UTC | #9
On Sun, Mar 06, 2022 at 11:06:12AM -0700, Jens Axboe wrote:
> On 3/6/22 11:01 AM, Christoph Hellwig wrote:
> > On Sun, Mar 06, 2022 at 10:11:46AM -0700, Jens Axboe wrote:
> >> Yes, I think we should kill it. If we retain the inode hint, the f2fs
> >> doesn't need a any changes. And it should be safe to make the per-file
> >> fcntl hints return EINVAL, which they would on older kernels anyway.
> >> Untested, but something like the below.
> > 
> > I've sent this off to the testing farm this morning, but EINVAL might
> > be even better:
> > 
> > http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/more-hint-removal

Yup, I like that.

> I do think EINVAL is better, as it just tells the app it's not available
> like we would've done before. With just doing zeroes, that might break
> applications that set-and-verify. Of course there's also the risk of
> that since we retain inode hints (so they work), but fail file hints.
> That's a lesser risk though, and we only know of the inode hints being
> used.

Agreed, I think EINVAL would be better here - jsut make it behave
like it would on a kernel that never supported this functionality in
the first place. Seems simpler to me for user applications if we do
that.

Cheers,

Dave.
Paul Menzel March 9, 2022, 8:24 a.m. UTC | #10
Dear Manjong,


Am 09.03.22 um 14:31 schrieb Manjong Lee:

Just a small note, that your message date is from the future. Please 
check your system clock.


Kind regards,

Paul
Manjong Lee March 9, 2022, 1:31 p.m. UTC | #11
>On Sun, ddMar 06, 2022 at 11:06:12AM -0700, Jens Axboe wrote:
>> On 3/6/22 11:01 AM, Christoph Hellwig wrote:
>> > On Sun, Mar 06, 2022 at 10:11:46AM -0700, Jens Axboe wrote:
>> >> Yes, I think we should kill it. If we retain the inode hint, the f2fs
>> >> doesn't need a any changes. And it should be safe to make the per-file
>> >> fcntl hints return EINVAL, which they would on older kernels anyway.
>> >> Untested, but something like the below.
>> > 
>> > I've sent this off to the testing farm this morning, but EINVAL might
>> > be even better:
>> > 
>> > http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/more-hint-removal
>
>Yup, I like that.
>
>> I do think EINVAL is better, as it just tells the app it's not available
>> like we would've done before. With just doing zeroes, that might break
>> applications that set-and-verify. Of course there's also the risk of
>> that since we retain inode hints (so they work), but fail file hints.
>> That's a lesser risk though, and we only know of the inode hints being
>> used.
>
>Agreed, I think EINVAL would be better here - jsut make it behave
>like it would on a kernel that never supported this functionality in
>the first place. Seems simpler to me for user applications if we do
>that.
>
>Cheers,
>
>Dave.
>-- 
>Dave Chinner
>david@fromorbit.com
>

Currently, UFS device also supports hot/cold data separation 
and uses existing write_hint code.

In other words, the function is also being used in storage other than NVMe,
and if it is removed, it is thought that there will be an operation problem.

If the code is removed, I am worried about how other devices
that use the function.

Is there a good alternative?
Luca Porzio March 10, 2022, 11:34 a.m. UTC | #12
> -----Original Message-----
> From: Manjong Lee <mj0123.lee@samsung.com>
> Sent: Wednesday, March 9, 2022 2:31 PM
> To: david@fromorbit.com
> Cc: axboe@kernel.dk; hch@lst.de; kbusch@kernel.org; linux-
> block@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux-
> nvme@lists.infradead.org; linux-raid@vger.kernel.org; sagi@grimberg.me;
> song@kernel.org; seunghwan.hyun@samsung.com;
> sookwan7.kim@samsung.com; nanich.lee@samsung.com;
> woosung2.lee@samsung.com; yt0928.kim@samsung.com;
> junho89.kim@samsung.com; jisoo2146.oh@samsung.com
> Subject: [EXT] Re: [PATCH 2/2] block: remove the per-bio/request write hint.
> 
> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless
> you recognize the sender and were expecting this message.
> 
> 
> >On Sun, ddMar 06, 2022 at 11:06:12AM -0700, Jens Axboe wrote:
> >> On 3/6/22 11:01 AM, Christoph Hellwig wrote:
> >> > On Sun, Mar 06, 2022 at 10:11:46AM -0700, Jens Axboe wrote:
> >> >> Yes, I think we should kill it. If we retain the inode hint, the
> >> >> f2fs doesn't need a any changes. And it should be safe to make the
> >> >> per-file fcntl hints return EINVAL, which they would on older kernels
> anyway.
> >> >> Untested, but something like the below.
> >> >
> >> > I've sent this off to the testing farm this morning, but EINVAL
> >> > might be even better:
> >> >
> >> > https://urldefense.com/v3/__http://git.infradead.org/users/hch/bloc
> >> > k.git/shortlog/refs/heads/more-hint-
> removal__;!!KZTdOCjhgt4hgw!qsgy
> >> >
> oejchUYPeorpCL0Ov3jPGvXpXgxa7hpSCViD7XQy7uJDMDLo3U8v_bmoUtg$
> >
> >Yup, I like that.
> >
> >> I do think EINVAL is better, as it just tells the app it's not
> >> available like we would've done before. With just doing zeroes, that
> >> might break applications that set-and-verify. Of course there's also
> >> the risk of that since we retain inode hints (so they work), but fail file
> hints.
> >> That's a lesser risk though, and we only know of the inode hints
> >> being used.
> >
> >Agreed, I think EINVAL would be better here - jsut make it behave like
> >it would on a kernel that never supported this functionality in the
> >first place. Seems simpler to me for user applications if we do that.
> >
> >Cheers,
> >
> >Dave.
> >--
> >Dave Chinner
> >david@fromorbit.com
> >
> 
> Currently, UFS device also supports hot/cold data separation and uses
> existing write_hint code.
> 
> In other words, the function is also being used in storage other than NVMe,
> and if it is removed, it is thought that there will be an operation problem.
> 
> If the code is removed, I am worried about how other devices that use the
> function.
> 
> Is there a good alternative?

Hi all,

I work for Micron UFS team. I confirm and support Manjong message above.
There are UFS customers using custom write_hint in Android and due to the 
"upstream first" policy from Google, if you remove write_hints in block device,
The Android ecosystem will suffer this lack.

Can we revert back this decision? Or think of an alternative solution which 
may work?

Cheers,
   Luca
Jens Axboe March 10, 2022, 12:15 p.m. UTC | #13
On 3/10/22 4:34 AM, Luca Porzio (lporzio) wrote:
>> -----Original Message-----
>> From: Manjong Lee <mj0123.lee@samsung.com>
>> Sent: Wednesday, March 9, 2022 2:31 PM
>> To: david@fromorbit.com
>> Cc: axboe@kernel.dk; hch@lst.de; kbusch@kernel.org; linux-
>> block@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux-
>> nvme@lists.infradead.org; linux-raid@vger.kernel.org; sagi@grimberg.me;
>> song@kernel.org; seunghwan.hyun@samsung.com;
>> sookwan7.kim@samsung.com; nanich.lee@samsung.com;
>> woosung2.lee@samsung.com; yt0928.kim@samsung.com;
>> junho89.kim@samsung.com; jisoo2146.oh@samsung.com
>> Subject: [EXT] Re: [PATCH 2/2] block: remove the per-bio/request write hint.
>>
>> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless
>> you recognize the sender and were expecting this message.
>>
>>
>>> On Sun, ddMar 06, 2022 at 11:06:12AM -0700, Jens Axboe wrote:
>>>> On 3/6/22 11:01 AM, Christoph Hellwig wrote:
>>>>> On Sun, Mar 06, 2022 at 10:11:46AM -0700, Jens Axboe wrote:
>>>>>> Yes, I think we should kill it. If we retain the inode hint, the
>>>>>> f2fs doesn't need a any changes. And it should be safe to make the
>>>>>> per-file fcntl hints return EINVAL, which they would on older kernels
>> anyway.
>>>>>> Untested, but something like the below.
>>>>>
>>>>> I've sent this off to the testing farm this morning, but EINVAL
>>>>> might be even better:
>>>>>
>>>>> https://urldefense.com/v3/__http://git.infradead.org/users/hch/bloc
>>>>> k.git/shortlog/refs/heads/more-hint-
>> removal__;!!KZTdOCjhgt4hgw!qsgy
>>>>>
>> oejchUYPeorpCL0Ov3jPGvXpXgxa7hpSCViD7XQy7uJDMDLo3U8v_bmoUtg$
>>>
>>> Yup, I like that.
>>>
>>>> I do think EINVAL is better, as it just tells the app it's not
>>>> available like we would've done before. With just doing zeroes, that
>>>> might break applications that set-and-verify. Of course there's also
>>>> the risk of that since we retain inode hints (so they work), but fail file
>> hints.
>>>> That's a lesser risk though, and we only know of the inode hints
>>>> being used.
>>>
>>> Agreed, I think EINVAL would be better here - jsut make it behave like
>>> it would on a kernel that never supported this functionality in the
>>> first place. Seems simpler to me for user applications if we do that.
>>>
>>> Cheers,
>>>
>>> Dave.
>>> --
>>> Dave Chinner
>>> david@fromorbit.com
>>>
>>
>> Currently, UFS device also supports hot/cold data separation and uses
>> existing write_hint code.
>>
>> In other words, the function is also being used in storage other than NVMe,
>> and if it is removed, it is thought that there will be an operation problem.
>>
>> If the code is removed, I am worried about how other devices that use the
>> function.
>>
>> Is there a good alternative?
> 
> Hi all,
> 
> I work for Micron UFS team. I confirm and support Manjong message above.
> There are UFS customers using custom write_hint in Android and due to the 
> "upstream first" policy from Google, if you remove write_hints in block device,
> The Android ecosystem will suffer this lack.
> 
> Can we revert back this decision? Or think of an alternative solution which 
> may work?

You do both realize that this is just the file specific hint? Inode
based hints will still work fine for UFS.
Christoph Hellwig March 10, 2022, 2:21 p.m. UTC | #14
On Thu, Mar 10, 2022 at 11:34:40AM +0000, Luca Porzio (lporzio) wrote:
> I work for Micron UFS team. I confirm and support Manjong message above.
> There are UFS customers using custom write_hint in Android and due to the 
> "upstream first" policy from Google, if you remove write_hints in block device,
> The Android ecosystem will suffer this lack.
> 
> Can we revert back this decision? Or think of an alternative solution which 
> may work?

Hell no.  Given that these custmomers never gave a singe fuck to actually
support whatever crap they came up with upstream we're not going to leave
dead code to encurage them to keep doing this.
Luca Porzio March 10, 2022, 6:50 p.m. UTC | #15
Micron Confidential

> 
> You do both realize that this is just the file specific hint? Inode based hints
> will still work fine for UFS.
> 
> --
> Jens Axboe

Jens,

Thanks for this reply.

This whole patch series removes support for per-bio write_hint. 
Without bio write_hint, F2FS won't be able to cascade Hot/Warm/Cold 
information to SCSI / UFS driver.

This is my current understanding. I might be wrong but I don't think we 
Are concerned with inode hint (as well as file hints).

Thanks,
  Luca


Micron Confidential
Luca Porzio March 10, 2022, 6:51 p.m. UTC | #16
Micron Confidential


> > Can we revert back this decision? Or think of an alternative solution
> > which may work?
> 
> Hell no.  Given that these custmomers never gave a singe fuck to actually
> support whatever crap they came up with upstream we're not going to leave
> dead code to encurage them to keep doing this.

You are starting from the wrong assumption: I'm just saying this is not dead code
but it is used across the (Android) ecosystem.

Micron Confidential
Jens Axboe March 10, 2022, 7:10 p.m. UTC | #17
On 3/10/22 11:50 AM, Luca Porzio (lporzio) wrote:
> Micron Confidential
> 
>>
>> You do both realize that this is just the file specific hint? Inode based hints
>> will still work fine for UFS.
>>
>> --
>> Jens Axboe
> 
> Jens,
> 
> Thanks for this reply.
> 
> This whole patch series removes support for per-bio write_hint. 
> Without bio write_hint, F2FS won't be able to cascade Hot/Warm/Cold 
> information to SCSI / UFS driver.
> 
> This is my current understanding. I might be wrong but I don't think we 
> Are concerned with inode hint (as well as file hints).

But ufs/scsi doesn't use it in mainline, as far as I can tell. So how
does that work?
Christoph Hellwig March 10, 2022, 7:14 p.m. UTC | #18
On Thu, Mar 10, 2022 at 06:51:39PM +0000, Luca Porzio (lporzio) wrote:
> You are starting from the wrong assumption: I'm just saying this is not dead code
> but it is used across the (Android) ecosystem.

No one gives cares about random forks, so no I'm not starting from the
wrong assumptіon.  You on the other hand seem to have a lot of learning
to do if you thing some random kernel fork matters the slightest about
what is considered dead code.
Bart Van Assche March 10, 2022, 7:34 p.m. UTC | #19
On 3/10/22 11:10, Jens Axboe wrote:
> On 3/10/22 11:50 AM, Luca Porzio (lporzio) wrote:
>> Micron Confidential
>>
>>>
>>> You do both realize that this is just the file specific hint? Inode based hints
>>> will still work fine for UFS.
>>>
>>> --
>>> Jens Axboe
>>
>> Jens,
>>
>> Thanks for this reply.
>>
>> This whole patch series removes support for per-bio write_hint.
>> Without bio write_hint, F2FS won't be able to cascade Hot/Warm/Cold
>> information to SCSI / UFS driver.
>>
>> This is my current understanding. I might be wrong but I don't think we
>> Are concerned with inode hint (as well as file hints).
> 
> But ufs/scsi doesn't use it in mainline, as far as I can tell. So how
> does that work?

Hi Luca,

I'm not aware of any Android branch on which the UFS driver or the SCSI 
core uses bi_write_hint or the struct request write_hint member. Did I 
perhaps overlook something?

Thanks,

Bart.
Bean Huo March 10, 2022, 9:52 p.m. UTC | #20
Micron Confidential

> >>
> >>>
> >>> You do both realize that this is just the file specific hint? Inode
> >>> based hints will still work fine for UFS.
> >>>
> >>> --
> >>> Jens Axboe
> >>
> >> Jens,
> >>
> >> Thanks for this reply.
> >>
> >> This whole patch series removes support for per-bio write_hint.
> >> Without bio write_hint, F2FS won't be able to cascade Hot/Warm/Cold
> >> information to SCSI / UFS driver.
> >>
> >> This is my current understanding. I might be wrong but I don't think
> >> we Are concerned with inode hint (as well as file hints).
> >
> > But ufs/scsi doesn't use it in mainline, as far as I can tell. So how
> > does that work?
> 
> Hi Luca,
> 
> I'm not aware of any Android branch on which the UFS driver or the SCSI core
> uses bi_write_hint or the struct request write_hint member. Did I perhaps
> overlook something?
> 
> Thanks,
> 


Bart,

Yes, in upstream linux and upstream android, there is no such code. But as we know,
mobile customers have used bio->bi_write_hint in their products for years. And the
group ID is set according to bio->bi_write_hint before passing the CDB to UFS.


	lrbp = &hba->lrb[tag];
 
              WARN_ON(lrbp->cmd);
             + if(cmd->cmnd[0] == WRITE_10)
              +{
                +             cmd->cmnd[6] = (0x1f& cmd->request->bio->bi_write_hint);
              +}             
              lrbp->cmd = cmd;
              lrbp->sense_bufflen = UFS_SENSE_SIZE;
              lrbp->sense_buffer = cmd->sense_buffer;

I don't know why they don't push these changes to the community, maybe
it's because changes across the file system and block layers are unacceptable to the
block layer and FS. but for sure we should now warn them to push to the
community as soon as possible. 

Bean



Micron Confidential
Jens Axboe March 10, 2022, 10:10 p.m. UTC | #21
On 3/10/22 2:52 PM, Bean Huo (beanhuo) wrote:
> Micron Confidential
> 
>>>>
>>>>>
>>>>> You do both realize that this is just the file specific hint? Inode
>>>>> based hints will still work fine for UFS.
>>>>>
>>>>> --
>>>>> Jens Axboe
>>>>
>>>> Jens,
>>>>
>>>> Thanks for this reply.
>>>>
>>>> This whole patch series removes support for per-bio write_hint.
>>>> Without bio write_hint, F2FS won't be able to cascade Hot/Warm/Cold
>>>> information to SCSI / UFS driver.
>>>>
>>>> This is my current understanding. I might be wrong but I don't think
>>>> we Are concerned with inode hint (as well as file hints).
>>>
>>> But ufs/scsi doesn't use it in mainline, as far as I can tell. So how
>>> does that work?
>>
>> Hi Luca,
>>
>> I'm not aware of any Android branch on which the UFS driver or the SCSI core
>> uses bi_write_hint or the struct request write_hint member. Did I perhaps
>> overlook something?
>>
>> Thanks,
>>
> 
> 
> Bart,
> 
> Yes, in upstream linux and upstream android, there is no such code.
> But as we know, mobile customers have used bio->bi_write_hint in their
> products for years. And the group ID is set according to
> bio->bi_write_hint before passing the CDB to UFS.
> 
> 
> 	lrbp = &hba->lrb[tag];
>  
>               WARN_ON(lrbp->cmd);
>              + if(cmd->cmnd[0] == WRITE_10)
>               +{
>                 +             cmd->cmnd[6] = (0x1f& cmd->request->bio->bi_write_hint);
>               +}             
>               lrbp->cmd = cmd;
>               lrbp->sense_bufflen = UFS_SENSE_SIZE;
>               lrbp->sense_buffer = cmd->sense_buffer;
> 
> I don't know why they don't push these changes to the community, maybe
> it's because changes across the file system and block layers are
> unacceptable to the block layer and FS. but for sure we should now
> warn them to push to the community as soon as possible. 

If the code isn't upstream, it's a bit late to start thinking about
that now. This feature has existed for 5 years at this point, and the
only consumer was NVMe. The upstream kernel cares only about what is
in-tree, as that is the only part we can modify and fix. We
change/modify internal kernel APIs all the time, which is how tech debt
is removed and the long term sanity of the project is maintained. This
in turn means that out-of-tree code will break, that's just a natural
side effect and something we can't do anything about.

If at some point there's a desire to actually try and upstream this
support, then we'll be happy to review that patchset. Or you can
continue to stay out-of-tree and just patch in what you need. If you're
already modifying core code, then that shouldn't be a problem.
Bart Van Assche March 10, 2022, 10:18 p.m. UTC | #22
On 3/10/22 13:52, Bean Huo (beanhuo) wrote:
> Yes, in upstream linux and upstream android, there is no such code. But as we know,
> mobile customers have used bio->bi_write_hint in their products for years. And the
> group ID is set according to bio->bi_write_hint before passing the CDB to UFS.
> 
> 
> 	lrbp = &hba->lrb[tag];
>   
>                WARN_ON(lrbp->cmd);
>               + if(cmd->cmnd[0] == WRITE_10)
>                +{
>                  +             cmd->cmnd[6] = (0x1f& cmd->request->bio->bi_write_hint);
>                +}
>                lrbp->cmd = cmd;
>                lrbp->sense_bufflen = UFS_SENSE_SIZE;
>                lrbp->sense_buffer = cmd->sense_buffer;
> 
> I don't know why they don't push these changes to the community, maybe
> it's because changes across the file system and block layers are unacceptable to the
> block layer and FS. but for sure we should now warn them to push to the
> community as soon as possible.

Thanks Bean for having shared this information. I think the above code sets the GROUP
NUMBER information in the WRITE(10) command and also that the following text from the
UFS specification applies to that information:
<quote>
GROUP NUMBER: Notifies the Target device that the data linked to a ContextID:
  -----------------------------------------------------------------------------------------
     GROUP NUMBER Value     |  Function
  -----------------------------------------------------------------------------------------
  00000b                    | Default, no Context ID is associated with the read operation.
  00001b to 01111b (0XXXXb) | Context ID. (XXXX I from 0001b to 1111b ‐ Context ID value)
  10000b                    | Data has System Data characteristics
  10001b to 11111b          | Reserved
  -----------------------------------------------------------------------------------------

In case the GROUP NUMBER is set to a reserved value, the operation shall fail and a status
response of CHECK CONDITION will be returned along with the sense key set to ILLEGAL REQUEST.
</quote>

Since there is a desire to remove the write hint information from struct bio, is there
any other information the "system data characteristics" information can be derived from?
How about e.g. deriving that information from request flags like REQ_SYNC, REQ_META and/or
REQ_IDLE?

Thanks,

Bart.
Eric Biggers March 11, 2022, 5:06 a.m. UTC | #23
On Thu, Mar 10, 2022 at 06:51:39PM +0000, Luca Porzio (lporzio) wrote:
>
> Micron Confidential

This is a public mailing list, so please do not use this header/footer.

> it is used across the (Android) ecosystem.

So why hasn't it been submitted upstream?

- Eric
Eric Biggers March 11, 2022, 5:31 a.m. UTC | #24
On Thu, Mar 10, 2022 at 02:18:19PM -0800, Bart Van Assche wrote:
> On 3/10/22 13:52, Bean Huo (beanhuo) wrote:
> > Yes, in upstream linux and upstream android, there is no such code. But as we know,
> > mobile customers have used bio->bi_write_hint in their products for years. And the
> > group ID is set according to bio->bi_write_hint before passing the CDB to UFS.
> > 
> > 
> > 	lrbp = &hba->lrb[tag];
> >                WARN_ON(lrbp->cmd);
> >               + if(cmd->cmnd[0] == WRITE_10)
> >                +{
> >                  +             cmd->cmnd[6] = (0x1f& cmd->request->bio->bi_write_hint);
> >                +}
> >                lrbp->cmd = cmd;
> >                lrbp->sense_bufflen = UFS_SENSE_SIZE;
> >                lrbp->sense_buffer = cmd->sense_buffer;
> > 
> > I don't know why they don't push these changes to the community, maybe
> > it's because changes across the file system and block layers are unacceptable to the
> > block layer and FS. but for sure we should now warn them to push to the
> > community as soon as possible.
> 
> Thanks Bean for having shared this information. I think the above code sets the GROUP
> NUMBER information in the WRITE(10) command and also that the following text from the
> UFS specification applies to that information:
> <quote>
> GROUP NUMBER: Notifies the Target device that the data linked to a ContextID:
>  -----------------------------------------------------------------------------------------
>     GROUP NUMBER Value     |  Function
>  -----------------------------------------------------------------------------------------
>  00000b                    | Default, no Context ID is associated with the read operation.
>  00001b to 01111b (0XXXXb) | Context ID. (XXXX I from 0001b to 1111b ‐ Context ID value)
>  10000b                    | Data has System Data characteristics
>  10001b to 11111b          | Reserved
>  -----------------------------------------------------------------------------------------
> 
> In case the GROUP NUMBER is set to a reserved value, the operation shall fail and a status
> response of CHECK CONDITION will be returned along with the sense key set to ILLEGAL REQUEST.
> </quote>
> 
> Since there is a desire to remove the write hint information from struct bio, is there
> any other information the "system data characteristics" information can be derived from?
> How about e.g. deriving that information from request flags like REQ_SYNC, REQ_META and/or
> REQ_IDLE?
> 

[+Cc linux-f2fs-devel]

I think the f2fs developers will need to chime in here, as it looks like f2fs
uses the write hints for different data categories like hot/cold/warm.  I'm not
sure those can be fully represented by other bio flags.

Either way, the good news is that it sounds like this "GROUP NUMBER" thing is
part of the UFS standard.  So whatever the best way to support it is, it can
just be submitted upstream like any other standard UFS feature.  Why hasn't that
been done?

- Eric
Luca Porzio March 11, 2022, 9:21 a.m. UTC | #25
> Since there is a desire to remove the write hint information from struct bio, is
> there any other information the "system data characteristics" information
> can be derived from?
> How about e.g. deriving that information from request flags like REQ_SYNC,
> REQ_META and/or REQ_IDLE?

Bart,

I agree with all your analysis. The point is these flags (Sync, Meta and Idle) are 
really meant for latency management whereas here the problem is hot/cold separation.

At this moment I am not aware of any methodology we can use to map S/M/Idle flags
Into hot/cold meaning but I am open to talk and discuss :-)

> 
> Thanks,
> 
> Bart.

Cheers,
  Luca
Luca Porzio March 11, 2022, 9:23 a.m. UTC | #26
> 
> On Thu, Mar 10, 2022 at 06:51:39PM +0000, Luca Porzio (lporzio) wrote:
> >
> > Micron Confidential
> 
> This is a public mailing list, so please do not use this header/footer.

Eric,

I am sorry, sometimes it is hard for me to cope with all the company proxies/firewall.
I'll try to make sure this won't happen in future.

> 
> > it is used across the (Android) ecosystem.
> 
> So why hasn't it been submitted upstream?

I'm not sure about this but I am open to fix this for future.

> 
> - Eric

Luca
Bart Van Assche March 11, 2022, 4:45 p.m. UTC | #27
On 3/10/22 14:10, Jens Axboe wrote:
> On 3/10/22 2:52 PM, Bean Huo (beanhuo) wrote:
>> Yes, in upstream linux and upstream android, there is no such code.
>> But as we know, mobile customers have used bio->bi_write_hint in their
>> products for years. And the group ID is set according to
>> bio->bi_write_hint before passing the CDB to UFS.
>>
>>
>> 	lrbp = &hba->lrb[tag];
>>   
>>                WARN_ON(lrbp->cmd);
>>               + if(cmd->cmnd[0] == WRITE_10)
>>                +{
>>                  +             cmd->cmnd[6] = (0x1f& cmd->request->bio->bi_write_hint);
>>                +}
>>                lrbp->cmd = cmd;
>>                lrbp->sense_bufflen = UFS_SENSE_SIZE;
>>                lrbp->sense_buffer = cmd->sense_buffer;
>>
>> I don't know why they don't push these changes to the community, maybe
>> it's because changes across the file system and block layers are
>> unacceptable to the block layer and FS. but for sure we should now
>> warn them to push to the community as soon as possible.
> 
> If the code isn't upstream, it's a bit late to start thinking about
> that now. This feature has existed for 5 years at this point, and the
> only consumer was NVMe. The upstream kernel cares only about what is
> in-tree, as that is the only part we can modify and fix. We
> change/modify internal kernel APIs all the time, which is how tech debt
> is removed and the long term sanity of the project is maintained. This
> in turn means that out-of-tree code will break, that's just a natural
> side effect and something we can't do anything about.
> 
> If at some point there's a desire to actually try and upstream this
> support, then we'll be happy to review that patchset. Or you can
> continue to stay out-of-tree and just patch in what you need. If you're
> already modifying core code, then that shouldn't be a problem.

Hi Jens,

The "upstream first" policy applies to the Android kernel (see also 
https://arstechnica.com/gadgets/2021/09/android-to-take-an-upstream-first-development-model-for-the-linux-kernel/). 
If anyone requests inclusion in the Android kernel tree of a patch that 
is not upstream, that request is rejected unless a very strong reason 
can be provided why it should be included in the Android kernel only 
instead of being sent upstream. It is not clear to me why the patch Bean 
mentioned is not upstream nor in the upstream Android kernel tree.

 From a UFS vendor I received the feedback that the F2FS write hint 
information helps to reduce write amplification significantly. If the 
write hint information is retained in the upstream kernel I can help 
with making sure that the UFS patch mentioned above is integrated in the 
upstream Linux kernel.

Thanks,

Bart.
Jens Axboe March 11, 2022, 4:56 p.m. UTC | #28
On 3/11/22 9:45 AM, Bart Van Assche wrote:
> On 3/10/22 14:10, Jens Axboe wrote:
>> On 3/10/22 2:52 PM, Bean Huo (beanhuo) wrote:
>>> Yes, in upstream linux and upstream android, there is no such code.
>>> But as we know, mobile customers have used bio->bi_write_hint in their
>>> products for years. And the group ID is set according to
>>> bio->bi_write_hint before passing the CDB to UFS.
>>>
>>>
>>>     lrbp = &hba->lrb[tag];
>>>                  WARN_ON(lrbp->cmd);
>>>               + if(cmd->cmnd[0] == WRITE_10)
>>>                +{
>>>                  +             cmd->cmnd[6] = (0x1f& cmd->request->bio->bi_write_hint);
>>>                +}
>>>                lrbp->cmd = cmd;
>>>                lrbp->sense_bufflen = UFS_SENSE_SIZE;
>>>                lrbp->sense_buffer = cmd->sense_buffer;
>>>
>>> I don't know why they don't push these changes to the community, maybe
>>> it's because changes across the file system and block layers are
>>> unacceptable to the block layer and FS. but for sure we should now
>>> warn them to push to the community as soon as possible.
>>
>> If the code isn't upstream, it's a bit late to start thinking about
>> that now. This feature has existed for 5 years at this point, and the
>> only consumer was NVMe. The upstream kernel cares only about what is
>> in-tree, as that is the only part we can modify and fix. We
>> change/modify internal kernel APIs all the time, which is how tech debt
>> is removed and the long term sanity of the project is maintained. This
>> in turn means that out-of-tree code will break, that's just a natural
>> side effect and something we can't do anything about.
>>
>> If at some point there's a desire to actually try and upstream this
>> support, then we'll be happy to review that patchset. Or you can
>> continue to stay out-of-tree and just patch in what you need. If you're
>> already modifying core code, then that shouldn't be a problem.
> 
> Hi Jens,
> 
> The "upstream first" policy applies to the Android kernel (see also
> https://arstechnica.com/gadgets/2021/09/android-to-take-an-upstream-first-development-model-for-the-linux-kernel/).
> If anyone requests inclusion in the Android kernel tree of a patch
> that is not upstream, that request is rejected unless a very strong
> reason can be provided why it should be included in the Android kernel
> only instead of being sent upstream. It is not clear to me why the
> patch Bean mentioned is not upstream nor in the upstream Android
> kernel tree.
> 
> From a UFS vendor I received the feedback that the F2FS write hint
> information helps to reduce write amplification significantly. If the
> write hint information is retained in the upstream kernel I can help
> with making sure that the UFS patch mentioned above is integrated in
> the upstream Linux kernel.

I'm really not that interested at this point, and I don't want to gate
removal or inclusion of code on some potential future event that may
never happen.

That doesn't mean that the work and process can't continue on the
Android front, the only difference is what the baseline kernel looks
like for the submitted patchset.

Hence I do think we should go ahead as planned, and then we'll just
revisit the topic if/when some actual code comes up.
Avi Shchislowski March 14, 2022, 7:40 a.m. UTC | #29
> >
> > Hi Jens,
> >
> > The "upstream first" policy applies to the Android kernel (see also
> > https://arstechnica.com/gadgets/2021/09/android-to-take-an-upstream-
> first-development-model-for-the-linux-kernel/).
> > If anyone requests inclusion in the Android kernel tree of a patch
> > that is not upstream, that request is rejected unless a very strong
> > reason can be provided why it should be included in the Android kernel
> > only instead of being sent upstream. It is not clear to me why the
> > patch Bean mentioned is not upstream nor in the upstream Android
> > kernel tree.
> >
> > From a UFS vendor I received the feedback that the F2FS write hint
> > information helps to reduce write amplification significantly. If the
> > write hint information is retained in the upstream kernel I can help
> > with making sure that the UFS patch mentioned above is integrated in
> > the upstream Linux kernel.
> 
> I'm really not that interested at this point, and I don't want to gate removal or
> inclusion of code on some potential future event that may never happen.
> 
> That doesn't mean that the work and process can't continue on the Android
> front, the only difference is what the baseline kernel looks like for the
> submitted patchset.
> 
> Hence I do think we should go ahead as planned, and then we'll just revisit
> the topic if/when some actual code comes up.
> 
We also supports Samsung & Micron approach and sorry to see that this functionality
has been removed.

Cheers,
Avi
> --
> Jens Axboe
>
Christoph Hellwig March 14, 2022, 8 a.m. UTC | #30
Which part of "the Linux kernel does not keep unused code around" is still
not clear to you after explaining it three times in this thread?
Eric Biggers March 14, 2022, 7:50 p.m. UTC | #31
On Mon, Mar 14, 2022 at 07:40:42AM +0000, Avi Shchislowski wrote:
>
> We also supports Samsung & Micron approach

Great, so send some patches upstream to have it be supported properly.

- Eric
Jens Axboe March 14, 2022, 7:58 p.m. UTC | #32
On 3/14/22 1:40 AM, Avi Shchislowski wrote:
>>>
>>> Hi Jens,
>>>
>>> The "upstream first" policy applies to the Android kernel (see also
>>> https://arstechnica.com/gadgets/2021/09/android-to-take-an-upstream-
>> first-development-model-for-the-linux-kernel/).
>>> If anyone requests inclusion in the Android kernel tree of a patch
>>> that is not upstream, that request is rejected unless a very strong
>>> reason can be provided why it should be included in the Android kernel
>>> only instead of being sent upstream. It is not clear to me why the
>>> patch Bean mentioned is not upstream nor in the upstream Android
>>> kernel tree.
>>>
>>> From a UFS vendor I received the feedback that the F2FS write hint
>>> information helps to reduce write amplification significantly. If the
>>> write hint information is retained in the upstream kernel I can help
>>> with making sure that the UFS patch mentioned above is integrated in
>>> the upstream Linux kernel.
>>
>> I'm really not that interested at this point, and I don't want to gate removal or
>> inclusion of code on some potential future event that may never happen.
>>
>> That doesn't mean that the work and process can't continue on the Android
>> front, the only difference is what the baseline kernel looks like for the
>> submitted patchset.
>>
>> Hence I do think we should go ahead as planned, and then we'll just revisit
>> the topic if/when some actual code comes up.
>>
> We also supports Samsung & Micron approach and sorry to see that this
> functionality has been removed.

This isn't some setup to solicit votes on who supports what. If the code
isn't upstream, it by definition doesn't exist to the kernel. No amount
of "we're also interested in this" changes that.

What I wrote earlier still applies - whoever is interested in supporting
lifetime hints should submit that code upstream. The existing patchset
to clean this up doesn't change that process AT ALL. As mentioned, the
only difference is what the baseline looks like in terms of what the
patchset is based on.
Luca Porzio March 15, 2022, 3:36 p.m. UTC | #33
> 
> This isn't some setup to solicit votes on who supports what. If the code isn't
> upstream, it by definition doesn't exist to the kernel. No amount of "we're
> also interested in this" changes that.
> 
> What I wrote earlier still applies - whoever is interested in supporting lifetime
> hints should submit that code upstream. The existing patchset to clean this
> up doesn't change that process AT ALL. As mentioned, the only difference is
> what the baseline looks like in terms of what the patchset is based on.
> 

Jens, 

Actually we might work to issue a patch and revert the patch plus add the code 
that Bean and Bart mentioned which is currently Android only.
The reason it has not been done before is because for now it's not production yet
but it may soon be that case.

Would this patch revert be an option and accepted as a closure for this discussion?

Another option (which I actually prefer), if I ask for a MM & Storage BoF discussion 
on storage hints where I can show you the status of temperature management
and my studies on how this is beneficial for storage devices. 

Would this be more beneficial and maybe get some wider consensus 
on the write hints?

After that consensus reverting (or agreeing on a new approach) will be easier.

> --
> Jens Axboe

Cheers,
   Luca
Jens Axboe March 15, 2022, 3:44 p.m. UTC | #34
On 3/15/22 9:36 AM, Luca Porzio (lporzio) wrote:
>>
>> This isn't some setup to solicit votes on who supports what. If the code isn't
>> upstream, it by definition doesn't exist to the kernel. No amount of "we're
>> also interested in this" changes that.
>>
>> What I wrote earlier still applies - whoever is interested in supporting lifetime
>> hints should submit that code upstream. The existing patchset to clean this
>> up doesn't change that process AT ALL. As mentioned, the only difference is
>> what the baseline looks like in terms of what the patchset is based on.
>>
> 
> Jens, 
> 
> Actually we might work to issue a patch and revert the patch plus add
> the code that Bean and Bart mentioned which is currently Android only.
> The reason it has not been done before is because for now it's not
> production yet but it may soon be that case.
> 
> Would this patch revert be an option and accepted as a closure for
> this discussion?

What patch revert? It's not clear to me which patch you're talking about
here. If you're talking about the "remove the per-bio/request write
hint" patch, then no, that's certainly not being reverted. See previous
replies I made and also below for why, and let's please stop beating
this dead horse.

> Another option (which I actually prefer), if I ask for a MM & Storage
> BoF discussion on storage hints where I can show you the status of
> temperature management and my studies on how this is beneficial for
> storage devices. 

As long as it's accompanied by code that implements it, then that would
be fine.

> Would this be more beneficial and maybe get some wider consensus on
> the write hints?
> 
> After that consensus reverting (or agreeing on a new approach) will be
> easier.

As I've said multiple times, whenever code is available, it'll be
reviewed and discussed. I don't like to discuss hypotheticals as too
many times in the past there's a promise made and expectations built
only for nothing to materialize. As it stands, the only in-kernel user
of the hints is gone, and that means that the support code is being
removed. We NEVER keep code in the kernel that doesn't have a user, as
it can't get tested.

Submit your patches when they are ready, it really has no bearing on the
currently queued up changes to write hints.
diff mbox series

Patch

diff --git a/block/bio.c b/block/bio.c
index b15f5466ce084..88e63bb187482 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -257,7 +257,6 @@  void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
 	bio->bi_opf = opf;
 	bio->bi_flags = 0;
 	bio->bi_ioprio = 0;
-	bio->bi_write_hint = 0;
 	bio->bi_status = 0;
 	bio->bi_iter.bi_sector = 0;
 	bio->bi_iter.bi_size = 0;
@@ -737,7 +736,6 @@  static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
 	    bio_flagged(bio_src, BIO_REMAPPED))
 		bio_set_flag(bio, BIO_REMAPPED);
 	bio->bi_ioprio = bio_src->bi_ioprio;
-	bio->bi_write_hint = bio_src->bi_write_hint;
 	bio->bi_iter = bio_src->bi_iter;
 
 	bio_clone_blkg_association(bio, bio_src);
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index 18c8eafe20b94..7c854584b52b5 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -170,7 +170,6 @@  static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
 		bio_set_flag(bio, BIO_REMAPPED);
 	bio->bi_opf		= bio_src->bi_opf;
 	bio->bi_ioprio		= bio_src->bi_ioprio;
-	bio->bi_write_hint	= bio_src->bi_write_hint;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index f5255991b773c..0e871d4e7cb8d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -782,13 +782,6 @@  static struct request *attempt_merge(struct request_queue *q,
 	    !blk_write_same_mergeable(req->bio, next->bio))
 		return NULL;
 
-	/*
-	 * Don't allow merge of different write hints, or for a hint with
-	 * non-hint IO.
-	 */
-	if (req->write_hint != next->write_hint)
-		return NULL;
-
 	if (req->ioprio != next->ioprio)
 		return NULL;
 
@@ -915,13 +908,6 @@  bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 	    !blk_write_same_mergeable(rq->bio, bio))
 		return false;
 
-	/*
-	 * Don't allow merge of different write hints, or for a hint with
-	 * non-hint IO.
-	 */
-	if (rq->write_hint != bio->bi_write_hint)
-		return false;
-
 	if (rq->ioprio != bio_prio(bio))
 		return false;
 
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 3a790eb4995c6..c2904c75c160f 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -183,35 +183,11 @@  static ssize_t queue_state_write(void *data, const char __user *buf,
 	return count;
 }
 
-static int queue_write_hint_show(void *data, struct seq_file *m)
-{
-	struct request_queue *q = data;
-	int i;
-
-	for (i = 0; i < BLK_MAX_WRITE_HINTS; i++)
-		seq_printf(m, "hint%d: %llu\n", i, q->write_hints[i]);
-
-	return 0;
-}
-
-static ssize_t queue_write_hint_store(void *data, const char __user *buf,
-				      size_t count, loff_t *ppos)
-{
-	struct request_queue *q = data;
-	int i;
-
-	for (i = 0; i < BLK_MAX_WRITE_HINTS; i++)
-		q->write_hints[i] = 0;
-
-	return count;
-}
-
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_queue_attrs[] = {
 	{ "poll_stat", 0400, queue_poll_stat_show },
 	{ "requeue_list", 0400, .seq_ops = &queue_requeue_list_seq_ops },
 	{ "pm_only", 0600, queue_pm_only_show, NULL },
 	{ "state", 0600, queue_state_show, queue_state_write },
-	{ "write_hints", 0600, queue_write_hint_show, queue_write_hint_store },
 	{ "zone_wlock", 0400, queue_zone_wlock_show, NULL },
 	{ },
 };
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a05ce77250316..f28023b0a87eb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2402,7 +2402,6 @@  static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
 		rq->cmd_flags |= REQ_FAILFAST_MASK;
 
 	rq->__sector = bio->bi_iter.bi_sector;
-	rq->write_hint = bio->bi_write_hint;
 	blk_rq_bio_prep(rq, bio, nr_segs);
 
 	/* This can't fail, since GFP_NOIO includes __GFP_DIRECT_RECLAIM. */
diff --git a/block/bounce.c b/block/bounce.c
index 3d50d19cde72a..9db1256d57d55 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -169,7 +169,6 @@  static struct bio *bounce_clone_bio(struct bio *bio_src)
 	if (bio_flagged(bio_src, BIO_REMAPPED))
 		bio_set_flag(bio, BIO_REMAPPED);
 	bio->bi_ioprio		= bio_src->bi_ioprio;
-	bio->bi_write_hint	= bio_src->bi_write_hint;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
diff --git a/block/fops.c b/block/fops.c
index 3696665e586a8..037f12294c863 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -83,7 +83,6 @@  static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 		bio_init(&bio, bdev, vecs, nr_pages, dio_bio_write_op(iocb));
 	}
 	bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT;
-	bio.bi_write_hint = iocb->ki_hint;
 	bio.bi_private = current;
 	bio.bi_end_io = blkdev_bio_end_io_simple;
 	bio.bi_ioprio = iocb->ki_ioprio;
@@ -225,7 +224,6 @@  static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 
 	for (;;) {
 		bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
-		bio->bi_write_hint = iocb->ki_hint;
 		bio->bi_private = dio;
 		bio->bi_end_io = blkdev_bio_end_io;
 		bio->bi_ioprio = iocb->ki_ioprio;
@@ -325,7 +323,6 @@  static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 	dio->flags = 0;
 	dio->iocb = iocb;
 	bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
-	bio->bi_write_hint = iocb->ki_hint;
 	bio->bi_end_io = blkdev_bio_end_io_async;
 	bio->bi_ioprio = iocb->ki_ioprio;
 
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index c3288d46948de..c3ce6afd60a71 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1137,8 +1137,6 @@  static void alloc_behind_master_bio(struct r1bio *r1_bio,
 		goto skip_copy;
 	}
 
-	behind_bio->bi_write_hint = bio->bi_write_hint;
-
 	while (i < vcnt && size) {
 		struct page *page;
 		int len = min_t(int, PAGE_SIZE, size);
diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c
index 3446797fa0aca..282ce16ee4e1f 100644
--- a/drivers/md/raid5-ppl.c
+++ b/drivers/md/raid5-ppl.c
@@ -469,7 +469,6 @@  static void ppl_submit_iounit(struct ppl_io_unit *io)
 	bio_set_dev(bio, log->rdev->bdev);
 	bio->bi_iter.bi_sector = log->next_io_sector;
 	bio_add_page(bio, io->header_page, PAGE_SIZE, 0);
-	bio->bi_write_hint = ppl_conf->write_hint;
 
 	pr_debug("%s: log->current_io_sector: %llu\n", __func__,
 	    (unsigned long long)log->next_io_sector);
@@ -499,7 +498,6 @@  static void ppl_submit_iounit(struct ppl_io_unit *io)
 			bio = bio_alloc_bioset(prev->bi_bdev, BIO_MAX_VECS,
 					       prev->bi_opf, GFP_NOIO,
 					       &ppl_conf->bs);
-			bio->bi_write_hint = prev->bi_write_hint;
 			bio->bi_iter.bi_sector = bio_end_sector(prev);
 			bio_add_page(bio, sh->ppl_page, PAGE_SIZE, 0);
 
@@ -1402,7 +1400,6 @@  int ppl_init_log(struct r5conf *conf)
 	atomic64_set(&ppl_conf->seq, 0);
 	INIT_LIST_HEAD(&ppl_conf->no_mem_stripes);
 	spin_lock_init(&ppl_conf->no_mem_stripes_lock);
-	ppl_conf->write_hint = RWH_WRITE_LIFE_NOT_SET;
 
 	if (!mddev->external) {
 		ppl_conf->signature = ~crc32c_le(~0, mddev->uuid, sizeof(mddev->uuid));
@@ -1501,25 +1498,13 @@  int ppl_modify_log(struct r5conf *conf, struct md_rdev *rdev, bool add)
 static ssize_t
 ppl_write_hint_show(struct mddev *mddev, char *buf)
 {
-	size_t ret = 0;
-	struct r5conf *conf;
-	struct ppl_conf *ppl_conf = NULL;
-
-	spin_lock(&mddev->lock);
-	conf = mddev->private;
-	if (conf && raid5_has_ppl(conf))
-		ppl_conf = conf->log_private;
-	ret = sprintf(buf, "%d\n", ppl_conf ? ppl_conf->write_hint : 0);
-	spin_unlock(&mddev->lock);
-
-	return ret;
+	return sprintf(buf, "%d\n", 0);
 }
 
 static ssize_t
 ppl_write_hint_store(struct mddev *mddev, const char *page, size_t len)
 {
 	struct r5conf *conf;
-	struct ppl_conf *ppl_conf;
 	int err = 0;
 	unsigned short new;
 
@@ -1533,17 +1518,10 @@  ppl_write_hint_store(struct mddev *mddev, const char *page, size_t len)
 		return err;
 
 	conf = mddev->private;
-	if (!conf) {
+	if (!conf)
 		err = -ENODEV;
-	} else if (raid5_has_ppl(conf)) {
-		ppl_conf = conf->log_private;
-		if (!ppl_conf)
-			err = -EINVAL;
-		else
-			ppl_conf->write_hint = new;
-	} else {
+	else if (!raid5_has_ppl(conf) || !conf->log_private)
 		err = -EINVAL;
-	}
 
 	mddev_unlock(mddev);
 
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8891aaba65964..78503db55ca4e 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1210,9 +1210,6 @@  static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 			bi->bi_io_vec[0].bv_len = RAID5_STRIPE_SIZE(conf);
 			bi->bi_io_vec[0].bv_offset = sh->dev[i].offset;
 			bi->bi_iter.bi_size = RAID5_STRIPE_SIZE(conf);
-			bi->bi_write_hint = sh->dev[i].write_hint;
-			if (!rrdev)
-				sh->dev[i].write_hint = RWH_WRITE_LIFE_NOT_SET;
 			/*
 			 * If this is discard request, set bi_vcnt 0. We don't
 			 * want to confuse SCSI because SCSI will replace payload
@@ -1264,8 +1261,6 @@  static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 			rbi->bi_io_vec[0].bv_len = RAID5_STRIPE_SIZE(conf);
 			rbi->bi_io_vec[0].bv_offset = sh->dev[i].offset;
 			rbi->bi_iter.bi_size = RAID5_STRIPE_SIZE(conf);
-			rbi->bi_write_hint = sh->dev[i].write_hint;
-			sh->dev[i].write_hint = RWH_WRITE_LIFE_NOT_SET;
 			/*
 			 * If this is discard request, set bi_vcnt 0. We don't
 			 * want to confuse SCSI because SCSI will replace payload
@@ -3416,7 +3411,6 @@  static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx,
 		(unsigned long long)sh->sector);
 
 	spin_lock_irq(&sh->stripe_lock);
-	sh->dev[dd_idx].write_hint = bi->bi_write_hint;
 	/* Don't allow new IO added to stripes in batch list */
 	if (sh->batch_head)
 		goto overlap;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index dee86911a4bef..1412f09537c73 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3321,7 +3321,6 @@  static int alloc_new_bio(struct btrfs_inode *inode,
 	bio_ctrl->bio_flags = bio_flags;
 	bio->bi_end_io = end_io_func;
 	bio->bi_private = &inode->io_tree;
-	bio->bi_write_hint = inode->vfs_inode.i_write_hint;
 	bio->bi_opf = opf;
 	ret = calc_bio_boundaries(bio_ctrl, inode, file_offset);
 	if (ret < 0)
diff --git a/fs/buffer.c b/fs/buffer.c
index a17c386a142c7..29c6c60660f61 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -53,7 +53,7 @@ 
 
 static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
 static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
-			 enum rw_hint hint, struct writeback_control *wbc);
+			 struct writeback_control *wbc);
 
 #define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers)
 
@@ -1806,8 +1806,7 @@  int __block_write_full_page(struct inode *inode, struct page *page,
 	do {
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
-			submit_bh_wbc(REQ_OP_WRITE, write_flags, bh,
-					inode->i_write_hint, wbc);
+			submit_bh_wbc(REQ_OP_WRITE, write_flags, bh, wbc);
 			nr_underway++;
 		}
 		bh = next;
@@ -1861,8 +1860,7 @@  int __block_write_full_page(struct inode *inode, struct page *page,
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
 			clear_buffer_dirty(bh);
-			submit_bh_wbc(REQ_OP_WRITE, write_flags, bh,
-					inode->i_write_hint, wbc);
+			submit_bh_wbc(REQ_OP_WRITE, write_flags, bh, wbc);
 			nr_underway++;
 		}
 		bh = next;
@@ -3008,7 +3006,7 @@  static void end_bio_bh_io_sync(struct bio *bio)
 }
 
 static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
-			 enum rw_hint write_hint, struct writeback_control *wbc)
+			 struct writeback_control *wbc)
 {
 	struct bio *bio;
 
@@ -3034,7 +3032,6 @@  static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 	fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
 
 	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
-	bio->bi_write_hint = write_hint;
 
 	bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
 	BUG_ON(bio->bi_iter.bi_size != bh->b_size);
@@ -3056,7 +3053,7 @@  static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 
 int submit_bh(int op, int op_flags, struct buffer_head *bh)
 {
-	return submit_bh_wbc(op, op_flags, bh, 0, NULL);
+	return submit_bh_wbc(op, op_flags, bh, NULL);
 }
 EXPORT_SYMBOL(submit_bh);
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 38bca4980a1ca..aef06e607b405 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -402,9 +402,6 @@  dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 		bio->bi_end_io = dio_bio_end_aio;
 	else
 		bio->bi_end_io = dio_bio_end_io;
-
-	bio->bi_write_hint = dio->iocb->ki_hint;
-
 	sdio->bio = bio;
 	sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
 }
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 1253982268730..0c31b61c7628a 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -374,7 +374,6 @@  void ext4_io_submit(struct ext4_io_submit *io)
 	if (bio) {
 		int io_op_flags = io->io_wbc->sync_mode == WB_SYNC_ALL ?
 				  REQ_SYNC : 0;
-		io->io_bio->bi_write_hint = io->io_end->inode->i_write_hint;
 		bio_set_op_attrs(io->io_bio, REQ_OP_WRITE, io_op_flags);
 		submit_bio(io->io_bio);
 	}
@@ -420,10 +419,8 @@  static void io_submit_add_bh(struct ext4_io_submit *io,
 submit_and_retry:
 		ext4_io_submit(io);
 	}
-	if (io->io_bio == NULL) {
+	if (io->io_bio == NULL)
 		io_submit_init_bio(io, bh);
-		io->io_bio->bi_write_hint = inode->i_write_hint;
-	}
 	ret = bio_add_page(io->io_bio, page, bh->b_size, bh_offset(bh));
 	if (ret != bh->b_size)
 		goto submit_and_retry;
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index e71dde8de0db0..20d65aa6243a1 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -403,8 +403,6 @@  static struct bio *__bio_alloc(struct f2fs_io_info *fio, int npages)
 	} else {
 		bio->bi_end_io = f2fs_write_end_io;
 		bio->bi_private = sbi;
-		bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi,
-						fio->type, fio->temp);
 	}
 	iostat_alloc_and_bind_ctx(sbi, bio, NULL);
 
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 4ae1eefae616d..6ba51cbb94cf2 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -491,7 +491,6 @@  static struct bio *gfs2_chain_bio(struct bio *prev, unsigned int nr_iovecs)
 	new = bio_alloc(prev->bi_bdev, nr_iovecs, prev->bi_opf, GFP_NOIO);
 	bio_clone_blkg_association(new, prev);
 	new->bi_iter.bi_sector = bio_end_sector(prev);
-	new->bi_write_hint = prev->bi_write_hint;
 	bio_chain(new, prev);
 	submit_bio(prev);
 	return new;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 491534e908615..fedca4de5f6b9 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1199,7 +1199,6 @@  iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
 			       REQ_OP_WRITE | wbc_to_write_flags(wbc),
 			       GFP_NOFS, &iomap_ioend_bioset);
 	bio->bi_iter.bi_sector = sector;
-	bio->bi_write_hint = inode->i_write_hint;
 	wbc_init_bio(wbc, bio);
 
 	ioend = container_of(bio, struct iomap_ioend, io_inline_bio);
@@ -1228,7 +1227,6 @@  iomap_chain_bio(struct bio *prev)
 	new = bio_alloc(prev->bi_bdev, BIO_MAX_VECS, prev->bi_opf, GFP_NOFS);
 	bio_clone_blkg_association(new, prev);
 	new->bi_iter.bi_sector = bio_end_sector(prev);
-	new->bi_write_hint = prev->bi_write_hint;
 
 	bio_chain(prev, new);
 	bio_get(prev);		/* for iomap_finish_ioend */
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index e2ba13645ef28..a434b1829545d 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -309,7 +309,6 @@  static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 
 		bio = bio_alloc(iomap->bdev, nr_pages, bio_opf, GFP_KERNEL);
 		bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
-		bio->bi_write_hint = dio->iocb->ki_hint;
 		bio->bi_ioprio = dio->iocb->ki_ioprio;
 		bio->bi_private = dio;
 		bio->bi_end_io = iomap_dio_bio_end_io;
diff --git a/fs/mpage.c b/fs/mpage.c
index dbfc02e23d97f..89eeef275a235 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -590,7 +590,6 @@  static int __mpage_writepage(struct page *page, struct writeback_control *wbc,
 		bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9);
 
 		wbc_init_bio(wbc, bio);
-		bio->bi_write_hint = inode->i_write_hint;
 	}
 
 	/*
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index d331b52592a0a..b71a23dd12556 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -695,7 +695,6 @@  static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
 	bio = bio_alloc(bdev, nr_pages,
 			REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE, GFP_NOFS);
 	bio->bi_iter.bi_sector = zi->i_zsector;
-	bio->bi_write_hint = iocb->ki_hint;
 	bio->bi_ioprio = iocb->ki_ioprio;
 	if (iocb->ki_flags & IOCB_DSYNC)
 		bio->bi_opf |= REQ_FUA;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 5561e58d158ac..ba8cfa57255f3 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -250,7 +250,6 @@  struct bio {
 						 */
 	unsigned short		bi_flags;	/* BIO_* below */
 	unsigned short		bi_ioprio;
-	unsigned short		bi_write_hint;
 	blk_status_t		bi_status;
 	atomic_t		__bi_remaining;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e19947d84f128..bb6d5ee02c070 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -518,9 +518,6 @@  struct request_queue {
 
 	bool			mq_sysfs_init_done;
 
-#define BLK_MAX_WRITE_HINTS	5
-	u64			write_hints[BLK_MAX_WRITE_HINTS];
-
 	/*
 	 * Independent sector access ranges. This is always NULL for
 	 * devices that do not have multiple independent access ranges.