Message ID | 20250407102345.50130-1-ailiop@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | exfat: enable request merging for dir readahead | expand |
On Mon, Apr 7, 2025 at 7:23 PM Anthony Iliopoulos <ailiop@suse.com> wrote: > > Directory listings that need to access the inode metadata (e.g. via > statx to obtain the file types) of large filesystems with lots of > metadata that aren't yet in dcache, will take a long time due to the > directory readahead submitting one io request at a time which although > targeting sequential disk sectors (up to EXFAT_MAX_RA_SIZE) are not > merged at the block layer. > > Add plugging around sb_breadahead so that the requests can be batched > and submitted jointly to the block layer where they can be merged by the > io schedulers, instead of having each request individually submitted to > the hardware queues. > > This significantly improves the throughput of directory listings as it > also minimizes the number of io completions and related handling from > the device driver side. > > Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> > --- > fs/exfat/dir.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c > index 3103b932b674..a46ab2690b4d 100644 > --- a/fs/exfat/dir.c > +++ b/fs/exfat/dir.c Hi Anthony, > @@ -621,6 +621,7 @@ static int exfat_dir_readahead(struct super_block *sb, sector_t sec) > { > struct exfat_sb_info *sbi = EXFAT_SB(sb); > struct buffer_head *bh; > + struct blk_plug plug; > unsigned int max_ra_count = EXFAT_MAX_RA_SIZE >> sb->s_blocksize_bits; > unsigned int page_ra_count = PAGE_SIZE >> sb->s_blocksize_bits; > unsigned int adj_ra_count = max(sbi->sect_per_clus, page_ra_count); > @@ -644,8 +645,10 @@ static int exfat_dir_readahead(struct super_block *sb, sector_t sec) > if (!bh || !buffer_uptodate(bh)) { > unsigned int i; It is better to move plug declaration here. Thanks! > > + blk_start_plug(&plug); > for (i = 0; i < ra_count; i++) > sb_breadahead(sb, (sector_t)(sec + i)); > + blk_finish_plug(&plug); > } > brelse(bh); > return 0; > -- > 2.49.0 >
Hi, Anthony > Directory listings that need to access the inode metadata (e.g. via > statx to obtain the file types) of large filesystems with lots of > metadata that aren't yet in dcache, will take a long time due to the > directory readahead submitting one io request at a time which although > targeting sequential disk sectors (up to EXFAT_MAX_RA_SIZE) are not > merged at the block layer. > > Add plugging around sb_breadahead so that the requests can be batched > and submitted jointly to the block layer where they can be merged by the > io schedulers, instead of having each request individually submitted to > the hardware queues. > > This significantly improves the throughput of directory listings as it > also minimizes the number of io completions and related handling from > the device driver side. Good approach. However, this attempt was in the past Samsung code, and there was a problem that the latency of directory-related operations became longer when ra_count is large (maybe, MAX_RA_SIZE). In the most recent code, blk_flush_plug is being done in units of pages as follows. ``` blk_start_plug(&plug); for (i = 0; i < ra_count; i++) { if (i && !(i & (sects_per_page - 1))) blk_flush_plug(&plug, false); sb_breadahead(sb, sec + i); } blk_finish_plug(&plug); ``` However, since blk_flush_plug is not exported, it can no longer be used in module build. It seems that blk_flush_plug needs to be exported or improved to repeat blk_start_plug and blk_finish_plug in units of pages. After changing to plug by page unit, could you also compare the throughput? Thanks > > Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> > --- > fs/exfat/dir.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c > index 3103b932b674..a46ab2690b4d 100644 > --- a/fs/exfat/dir.c > +++ b/fs/exfat/dir.c > @@ -621,6 +621,7 @@ static int exfat_dir_readahead(struct super_block *sb, > sector_t sec) > { > struct exfat_sb_info *sbi = EXFAT_SB(sb); > struct buffer_head *bh; > + struct blk_plug plug; > unsigned int max_ra_count = EXFAT_MAX_RA_SIZE >> sb- > >s_blocksize_bits; > unsigned int page_ra_count = PAGE_SIZE >> sb->s_blocksize_bits; > unsigned int adj_ra_count = max(sbi->sect_per_clus, page_ra_count); > @@ -644,8 +645,10 @@ static int exfat_dir_readahead(struct super_block *sb, > sector_t sec) > if (!bh || !buffer_uptodate(bh)) { > unsigned int i; > > + blk_start_plug(&plug); > for (i = 0; i < ra_count; i++) > sb_breadahead(sb, (sector_t)(sec + i)); > + blk_finish_plug(&plug); > } > brelse(bh); > return 0; > -- > 2.49.0
diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c index 3103b932b674..a46ab2690b4d 100644 --- a/fs/exfat/dir.c +++ b/fs/exfat/dir.c @@ -621,6 +621,7 @@ static int exfat_dir_readahead(struct super_block *sb, sector_t sec) { struct exfat_sb_info *sbi = EXFAT_SB(sb); struct buffer_head *bh; + struct blk_plug plug; unsigned int max_ra_count = EXFAT_MAX_RA_SIZE >> sb->s_blocksize_bits; unsigned int page_ra_count = PAGE_SIZE >> sb->s_blocksize_bits; unsigned int adj_ra_count = max(sbi->sect_per_clus, page_ra_count); @@ -644,8 +645,10 @@ static int exfat_dir_readahead(struct super_block *sb, sector_t sec) if (!bh || !buffer_uptodate(bh)) { unsigned int i; + blk_start_plug(&plug); for (i = 0; i < ra_count; i++) sb_breadahead(sb, (sector_t)(sec + i)); + blk_finish_plug(&plug); } brelse(bh); return 0;
Directory listings that need to access the inode metadata (e.g. via statx to obtain the file types) of large filesystems with lots of metadata that aren't yet in dcache, will take a long time due to the directory readahead submitting one io request at a time which although targeting sequential disk sectors (up to EXFAT_MAX_RA_SIZE) are not merged at the block layer. Add plugging around sb_breadahead so that the requests can be batched and submitted jointly to the block layer where they can be merged by the io schedulers, instead of having each request individually submitted to the hardware queues. This significantly improves the throughput of directory listings as it also minimizes the number of io completions and related handling from the device driver side. Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> --- fs/exfat/dir.c | 3 +++ 1 file changed, 3 insertions(+)