exfat: enable request merging for dir readahead

Message ID	20250407102345.50130-1-ailiop@suse.com (mailing list archive)
State	New
Headers	show Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95F2E230BF4 for <linux-fsdevel@vger.kernel.org>; Mon, 7 Apr 2025 10:23:56 +0000 (UTC) From: Anthony Iliopoulos <ailiop@suse.com> To: Namjae Jeon <linkinjeon@kernel.org>, Sungjong Seo <sj1557.seo@samsung.com>, Yuezhang Mo <yuezhang.mo@sony.com> Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] exfat: enable request merging for dir readahead Date: Mon, 7 Apr 2025 12:23:44 +0200 Message-ID: <20250407102345.50130-1-ailiop@suse.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[99.99%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-0.999]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; TO_DN_SOME(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_FIVE(0.00)[5]; FROM_EQ_ENVFROM(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:mid,kunlun.arch.suse.cz:helo]
Series	exfat: enable request merging for dir readahead \| expand exfat: enable request merging for dir readahead

Message ID

20250407102345.50130-1-ailiop@suse.com (mailing list archive)

State

New

Headers

From: Anthony Iliopoulos <ailiop@suse.com>
To: Namjae Jeon <linkinjeon@kernel.org>,
	Sungjong Seo <sj1557.seo@samsung.com>,
	Yuezhang Mo <yuezhang.mo@sony.com>
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH] exfat: enable request merging for dir readahead
Date: Mon,  7 Apr 2025 12:23:44 +0200
Message-ID: <20250407102345.50130-1-ailiop@suse.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

exfat: enable request merging for dir readahead | expand

Commit Message

Anthony Iliopoulos April 7, 2025, 10:23 a.m. UTC

Directory listings that need to access the inode metadata (e.g. via
statx to obtain the file types) of large filesystems with lots of
metadata that aren't yet in dcache, will take a long time due to the
directory readahead submitting one io request at a time which although
targeting sequential disk sectors (up to EXFAT_MAX_RA_SIZE) are not
merged at the block layer.

Add plugging around sb_breadahead so that the requests can be batched
and submitted jointly to the block layer where they can be merged by the
io schedulers, instead of having each request individually submitted to
the hardware queues.

This significantly improves the throughput of directory listings as it
also minimizes the number of io completions and related handling from
the device driver side.

Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
---
 fs/exfat/dir.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Namjae Jeon April 7, 2025, 1 p.m. UTC | #1

On Mon, Apr 7, 2025 at 7:23 PM Anthony Iliopoulos <ailiop@suse.com> wrote:
>
> Directory listings that need to access the inode metadata (e.g. via
> statx to obtain the file types) of large filesystems with lots of
> metadata that aren't yet in dcache, will take a long time due to the
> directory readahead submitting one io request at a time which although
> targeting sequential disk sectors (up to EXFAT_MAX_RA_SIZE) are not
> merged at the block layer.
>
> Add plugging around sb_breadahead so that the requests can be batched
> and submitted jointly to the block layer where they can be merged by the
> io schedulers, instead of having each request individually submitted to
> the hardware queues.
>
> This significantly improves the throughput of directory listings as it
> also minimizes the number of io completions and related handling from
> the device driver side.
>
> Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
> ---
>  fs/exfat/dir.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
> index 3103b932b674..a46ab2690b4d 100644
> --- a/fs/exfat/dir.c
> +++ b/fs/exfat/dir.c

Hi Anthony,
> @@ -621,6 +621,7 @@ static int exfat_dir_readahead(struct super_block *sb, sector_t sec)
>  {
>         struct exfat_sb_info *sbi = EXFAT_SB(sb);
>         struct buffer_head *bh;
> +       struct blk_plug plug;
>         unsigned int max_ra_count = EXFAT_MAX_RA_SIZE >> sb->s_blocksize_bits;
>         unsigned int page_ra_count = PAGE_SIZE >> sb->s_blocksize_bits;
>         unsigned int adj_ra_count = max(sbi->sect_per_clus, page_ra_count);
> @@ -644,8 +645,10 @@ static int exfat_dir_readahead(struct super_block *sb, sector_t sec)
>         if (!bh || !buffer_uptodate(bh)) {
>                 unsigned int i;
It is better to move plug declaration here.
Thanks!
>
> +               blk_start_plug(&plug);
>                 for (i = 0; i < ra_count; i++)
>                         sb_breadahead(sb, (sector_t)(sec + i));
> +               blk_finish_plug(&plug);
>         }
>         brelse(bh);
>         return 0;
> --
> 2.49.0
>

Sungjong Seo April 8, 2025, 1:15 a.m. UTC | #2

Hi, Anthony

> Directory listings that need to access the inode metadata (e.g. via
> statx to obtain the file types) of large filesystems with lots of
> metadata that aren't yet in dcache, will take a long time due to the
> directory readahead submitting one io request at a time which although
> targeting sequential disk sectors (up to EXFAT_MAX_RA_SIZE) are not
> merged at the block layer.
> 
> Add plugging around sb_breadahead so that the requests can be batched
> and submitted jointly to the block layer where they can be merged by the
> io schedulers, instead of having each request individually submitted to
> the hardware queues.
> 
> This significantly improves the throughput of directory listings as it
> also minimizes the number of io completions and related handling from
> the device driver side.

Good approach. However, this attempt was in the past Samsung code,
and there was a problem that the latency of directory-related operations
became longer when ra_count is large (maybe, MAX_RA_SIZE).
In the most recent code, blk_flush_plug is being done in units of
pages as follows.

```
blk_start_plug(&plug);
for (i = 0; i < ra_count; i++) {
        if (i && !(i & (sects_per_page - 1)))
                blk_flush_plug(&plug, false);
        sb_breadahead(sb, sec + i);
}
blk_finish_plug(&plug);
```

However, since blk_flush_plug is not exported, it can no longer be used in
module build. It seems that blk_flush_plug needs to be exported or
improved to repeat blk_start_plug and blk_finish_plug in units of pages.

After changing to plug by page unit, could you also compare the throughput?

Thanks

> 
> Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
> ---
>  fs/exfat/dir.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
> index 3103b932b674..a46ab2690b4d 100644
> --- a/fs/exfat/dir.c
> +++ b/fs/exfat/dir.c
> @@ -621,6 +621,7 @@ static int exfat_dir_readahead(struct super_block *sb,
> sector_t sec)
>  {
>  	struct exfat_sb_info *sbi = EXFAT_SB(sb);
>  	struct buffer_head *bh;
> +	struct blk_plug plug;
>  	unsigned int max_ra_count = EXFAT_MAX_RA_SIZE >> sb-
> >s_blocksize_bits;
>  	unsigned int page_ra_count = PAGE_SIZE >> sb->s_blocksize_bits;
>  	unsigned int adj_ra_count = max(sbi->sect_per_clus, page_ra_count);
> @@ -644,8 +645,10 @@ static int exfat_dir_readahead(struct super_block
*sb,
> sector_t sec)
>  	if (!bh || !buffer_uptodate(bh)) {
>  		unsigned int i;
> 
> +		blk_start_plug(&plug);
>  		for (i = 0; i < ra_count; i++)
>  			sb_breadahead(sb, (sector_t)(sec + i));
> +		blk_finish_plug(&plug);
>  	}
>  	brelse(bh);
>  	return 0;
> --
> 2.49.0

diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
index 3103b932b674..a46ab2690b4d 100644
--- a/fs/exfat/dir.c
+++ b/fs/exfat/dir.c
@@ -621,6 +621,7 @@  static int exfat_dir_readahead(struct super_block *sb, sector_t sec)
 {
 	struct exfat_sb_info *sbi = EXFAT_SB(sb);
 	struct buffer_head *bh;
+	struct blk_plug plug;
 	unsigned int max_ra_count = EXFAT_MAX_RA_SIZE >> sb->s_blocksize_bits;
 	unsigned int page_ra_count = PAGE_SIZE >> sb->s_blocksize_bits;
 	unsigned int adj_ra_count = max(sbi->sect_per_clus, page_ra_count);
@@ -644,8 +645,10 @@  static int exfat_dir_readahead(struct super_block *sb, sector_t sec)
 	if (!bh || !buffer_uptodate(bh)) {
 		unsigned int i;
 
+		blk_start_plug(&plug);
 		for (i = 0; i < ra_count; i++)
 			sb_breadahead(sb, (sector_t)(sec + i));
+		blk_finish_plug(&plug);
 	}
 	brelse(bh);
 	return 0;

exfat: enable request merging for dir readahead

Commit Message

Comments

Patch