diff mbox series

exfat: enable request merging for dir readahead

Message ID 20250407102345.50130-1-ailiop@suse.com (mailing list archive)
State New
Headers show
Series exfat: enable request merging for dir readahead | expand

Commit Message

Anthony Iliopoulos April 7, 2025, 10:23 a.m. UTC
Directory listings that need to access the inode metadata (e.g. via
statx to obtain the file types) of large filesystems with lots of
metadata that aren't yet in dcache, will take a long time due to the
directory readahead submitting one io request at a time which although
targeting sequential disk sectors (up to EXFAT_MAX_RA_SIZE) are not
merged at the block layer.

Add plugging around sb_breadahead so that the requests can be batched
and submitted jointly to the block layer where they can be merged by the
io schedulers, instead of having each request individually submitted to
the hardware queues.

This significantly improves the throughput of directory listings as it
also minimizes the number of io completions and related handling from
the device driver side.

Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
---
 fs/exfat/dir.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Namjae Jeon April 7, 2025, 1 p.m. UTC | #1
On Mon, Apr 7, 2025 at 7:23 PM Anthony Iliopoulos <ailiop@suse.com> wrote:
>
> Directory listings that need to access the inode metadata (e.g. via
> statx to obtain the file types) of large filesystems with lots of
> metadata that aren't yet in dcache, will take a long time due to the
> directory readahead submitting one io request at a time which although
> targeting sequential disk sectors (up to EXFAT_MAX_RA_SIZE) are not
> merged at the block layer.
>
> Add plugging around sb_breadahead so that the requests can be batched
> and submitted jointly to the block layer where they can be merged by the
> io schedulers, instead of having each request individually submitted to
> the hardware queues.
>
> This significantly improves the throughput of directory listings as it
> also minimizes the number of io completions and related handling from
> the device driver side.
>
> Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
> ---
>  fs/exfat/dir.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
> index 3103b932b674..a46ab2690b4d 100644
> --- a/fs/exfat/dir.c
> +++ b/fs/exfat/dir.c

Hi Anthony,
> @@ -621,6 +621,7 @@ static int exfat_dir_readahead(struct super_block *sb, sector_t sec)
>  {
>         struct exfat_sb_info *sbi = EXFAT_SB(sb);
>         struct buffer_head *bh;
> +       struct blk_plug plug;
>         unsigned int max_ra_count = EXFAT_MAX_RA_SIZE >> sb->s_blocksize_bits;
>         unsigned int page_ra_count = PAGE_SIZE >> sb->s_blocksize_bits;
>         unsigned int adj_ra_count = max(sbi->sect_per_clus, page_ra_count);
> @@ -644,8 +645,10 @@ static int exfat_dir_readahead(struct super_block *sb, sector_t sec)
>         if (!bh || !buffer_uptodate(bh)) {
>                 unsigned int i;
It is better to move plug declaration here.
Thanks!
>
> +               blk_start_plug(&plug);
>                 for (i = 0; i < ra_count; i++)
>                         sb_breadahead(sb, (sector_t)(sec + i));
> +               blk_finish_plug(&plug);
>         }
>         brelse(bh);
>         return 0;
> --
> 2.49.0
>
Sungjong Seo April 8, 2025, 1:15 a.m. UTC | #2
Hi, Anthony

> Directory listings that need to access the inode metadata (e.g. via
> statx to obtain the file types) of large filesystems with lots of
> metadata that aren't yet in dcache, will take a long time due to the
> directory readahead submitting one io request at a time which although
> targeting sequential disk sectors (up to EXFAT_MAX_RA_SIZE) are not
> merged at the block layer.
> 
> Add plugging around sb_breadahead so that the requests can be batched
> and submitted jointly to the block layer where they can be merged by the
> io schedulers, instead of having each request individually submitted to
> the hardware queues.
> 
> This significantly improves the throughput of directory listings as it
> also minimizes the number of io completions and related handling from
> the device driver side.

Good approach. However, this attempt was in the past Samsung code,
and there was a problem that the latency of directory-related operations
became longer when ra_count is large (maybe, MAX_RA_SIZE).
In the most recent code, blk_flush_plug is being done in units of
pages as follows.

```
blk_start_plug(&plug);
for (i = 0; i < ra_count; i++) {
        if (i && !(i & (sects_per_page - 1)))
                blk_flush_plug(&plug, false);
        sb_breadahead(sb, sec + i);
}
blk_finish_plug(&plug);
```

However, since blk_flush_plug is not exported, it can no longer be used in
module build. It seems that blk_flush_plug needs to be exported or
improved to repeat blk_start_plug and blk_finish_plug in units of pages.

After changing to plug by page unit, could you also compare the throughput?

Thanks

> 
> Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
> ---
>  fs/exfat/dir.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
> index 3103b932b674..a46ab2690b4d 100644
> --- a/fs/exfat/dir.c
> +++ b/fs/exfat/dir.c
> @@ -621,6 +621,7 @@ static int exfat_dir_readahead(struct super_block *sb,
> sector_t sec)
>  {
>  	struct exfat_sb_info *sbi = EXFAT_SB(sb);
>  	struct buffer_head *bh;
> +	struct blk_plug plug;
>  	unsigned int max_ra_count = EXFAT_MAX_RA_SIZE >> sb-
> >s_blocksize_bits;
>  	unsigned int page_ra_count = PAGE_SIZE >> sb->s_blocksize_bits;
>  	unsigned int adj_ra_count = max(sbi->sect_per_clus, page_ra_count);
> @@ -644,8 +645,10 @@ static int exfat_dir_readahead(struct super_block
*sb,
> sector_t sec)
>  	if (!bh || !buffer_uptodate(bh)) {
>  		unsigned int i;
> 
> +		blk_start_plug(&plug);
>  		for (i = 0; i < ra_count; i++)
>  			sb_breadahead(sb, (sector_t)(sec + i));
> +		blk_finish_plug(&plug);
>  	}
>  	brelse(bh);
>  	return 0;
> --
> 2.49.0
diff mbox series

Patch

diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
index 3103b932b674..a46ab2690b4d 100644
--- a/fs/exfat/dir.c
+++ b/fs/exfat/dir.c
@@ -621,6 +621,7 @@  static int exfat_dir_readahead(struct super_block *sb, sector_t sec)
 {
 	struct exfat_sb_info *sbi = EXFAT_SB(sb);
 	struct buffer_head *bh;
+	struct blk_plug plug;
 	unsigned int max_ra_count = EXFAT_MAX_RA_SIZE >> sb->s_blocksize_bits;
 	unsigned int page_ra_count = PAGE_SIZE >> sb->s_blocksize_bits;
 	unsigned int adj_ra_count = max(sbi->sect_per_clus, page_ra_count);
@@ -644,8 +645,10 @@  static int exfat_dir_readahead(struct super_block *sb, sector_t sec)
 	if (!bh || !buffer_uptodate(bh)) {
 		unsigned int i;
 
+		blk_start_plug(&plug);
 		for (i = 0; i < ra_count; i++)
 			sb_breadahead(sb, (sector_t)(sec + i));
+		blk_finish_plug(&plug);
 	}
 	brelse(bh);
 	return 0;