diff mbox series

xfs: don't allow log recover IO to be throttled

Message ID 20250303112301.766938-1-alexjlzheng@tencent.com (mailing list archive)
State New
Headers show
Series xfs: don't allow log recover IO to be throttled | expand

Commit Message

Jinliang Zheng March 3, 2025, 11:23 a.m. UTC
When recovering a large filesystem, avoid log recover IO being
throttled by rq_qos_throttle().

Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
 fs/xfs/xfs_bio_io.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig March 3, 2025, 2:08 p.m. UTC | #1
On Mon, Mar 03, 2025 at 07:23:01PM +0800, Jinliang Zheng wrote:
> When recovering a large filesystem, avoid log recover IO being
> throttled by rq_qos_throttle().

Why?  Do you have numbers or a bug report?

> diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c
> index fe21c76f75b8..259955f2aeb2 100644
> --- a/fs/xfs/xfs_bio_io.c
> +++ b/fs/xfs/xfs_bio_io.c
> @@ -22,12 +22,15 @@ xfs_rw_bdev(
>  	unsigned int		left = count;
>  	int			error;
>  	struct bio		*bio;
> +	blk_opf_t		opf = op | REQ_META | REQ_SYNC;
>  
>  	if (is_vmalloc && op == REQ_OP_WRITE)
>  		flush_kernel_vmap_range(data, count);
>  
> -	bio = bio_alloc(bdev, bio_max_vecs(left), op | REQ_META | REQ_SYNC,
> -			GFP_KERNEL);
> +	if (op == REQ_OP_WRITE)
> +		opf |= REQ_IDLE;

And there's really no need to do any games with the op here.  Do it in
the caller and document why it's done there.
Dave Chinner March 3, 2025, 8:45 p.m. UTC | #2
On Mon, Mar 03, 2025 at 07:23:01PM +0800, Jinliang Zheng wrote:
> When recovering a large filesystem, avoid log recover IO being
> throttled by rq_qos_throttle().

Why?

The only writes to the journal during recovery are to clear stale
blocks - it's only a very small part of the IO that journal recovery
typically does. What problem happens when these writes are
throttled?

-Dave.
Jinliang Zheng March 9, 2025, 12:41 p.m. UTC | #3
On Tue, 4 Mar 2025 07:45:44 +1100, Dave Chinner wrote:
> On Mon, Mar 03, 2025 at 07:23:01PM +0800, Jinliang Zheng wrote:
> > When recovering a large filesystem, avoid log recover IO being
> > throttled by rq_qos_throttle().
> 
> Why?
> 
> The only writes to the journal during recovery are to clear stale
> blocks - it's only a very small part of the IO that journal recovery
> typically does. What problem happens when these writes are
> throttled?

Sorry for the late reply, I was struggling with my work. :-(

Recently, we encountered the problem of xfs log IO being throttled in
the Linux distribution version maintained by ourselves. To be more
precise, it was indirectly throttled by the IO issued by the LVM layer.
For details, see [1] please.

After this problem was solved, we naturally checked other related log
IO paths, hoping that they would not be throttled by wbt_wait(), that
is, we hoped that they would be marked with REQ_SYNC | REQ_IDLE.

For log recover IO, in the LVM scenario, we are not sure whether it
will be affected by IO on other LVs on the same PV. In addition, we
did not find any obvious side effects of this patch. An ounce of
prevention is worth a pound of cure, and we think it is more
appropriate to add REQ_IDLE here.

Of course, if there is really a reason not to consider being throttled,
please forgive me for disturbing you.

[1] https://lore.kernel.org/linux-xfs/20250220112014.3209940-1-alexjlzheng@tencent.com/

Thank you very much. :)
Jinliang Zheng

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Carlos Maiolino March 10, 2025, 1:17 p.m. UTC | #4
On Sun, Mar 09, 2025 at 08:41:33PM +0800, Jinliang Zheng wrote:
> On Tue, 4 Mar 2025 07:45:44 +1100, Dave Chinner wrote:
> > On Mon, Mar 03, 2025 at 07:23:01PM +0800, Jinliang Zheng wrote:
> > > When recovering a large filesystem, avoid log recover IO being
> > > throttled by rq_qos_throttle().
> >
> > Why?
> >
> > The only writes to the journal during recovery are to clear stale
> > blocks - it's only a very small part of the IO that journal recovery
> > typically does. What problem happens when these writes are
> > throttled?
> 
> Sorry for the late reply, I was struggling with my work. :-(
> 
> Recently, we encountered the problem of xfs log IO being throttled in
> the Linux distribution version maintained by ourselves. To be more
> precise, it was indirectly throttled by the IO issued by the LVM layer.
> For details, see [1] please.

Ok, so you properly fixed the problem on the DM layer.

> 
> After this problem was solved, we naturally checked other related log
> IO paths, hoping that they would not be throttled by wbt_wait(), that
> is, we hoped that they would be marked with REQ_SYNC | REQ_IDLE.
> 
> For log recover IO, in the LVM scenario, we are not sure whether it
> will be affected by IO on other LVs on the same PV. In addition, we
> did not find any obvious side effects of this patch. An ounce of
> prevention is worth a pound of cure, and we think it is more
> appropriate to add REQ_IDLE here.

If you notice any problem with this that you're trying to fix, or if
this change improves anything, please specify that in the commit message
 - also addressing comments by Christoph, i.e. xfs_rw_bdev shouldn't be
messing with request ops - Just because it has no side-effects is not
a good reason. Regular Log IO being throttled by the DM layer is indeed
a problem, but considering the very small amount of data written here
during log recovery doesn't seem a good use of REQ_IDLE.

So, for now, NAK.

Carlos


> 
> Of course, if there is really a reason not to consider being throttled,
> please forgive me for disturbing you.
> 
> [1] https://lore.kernel.org/linux-xfs/20250220112014.3209940-1-alexjlzheng@tencent.com/
> 
> Thank you very much. :)
> Jinliang Zheng
> 
> >
> > -Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
diff mbox series

Patch

diff --git a/fs/xfs/xfs_bio_io.c b/fs/xfs/xfs_bio_io.c
index fe21c76f75b8..259955f2aeb2 100644
--- a/fs/xfs/xfs_bio_io.c
+++ b/fs/xfs/xfs_bio_io.c
@@ -22,12 +22,15 @@  xfs_rw_bdev(
 	unsigned int		left = count;
 	int			error;
 	struct bio		*bio;
+	blk_opf_t		opf = op | REQ_META | REQ_SYNC;
 
 	if (is_vmalloc && op == REQ_OP_WRITE)
 		flush_kernel_vmap_range(data, count);
 
-	bio = bio_alloc(bdev, bio_max_vecs(left), op | REQ_META | REQ_SYNC,
-			GFP_KERNEL);
+	if (op == REQ_OP_WRITE)
+		opf |= REQ_IDLE;
+
+	bio = bio_alloc(bdev, bio_max_vecs(left), opf, GFP_KERNEL);
 	bio->bi_iter.bi_sector = sector;
 
 	do {