diff mbox series

Revert "block: simplify set_init_blocksize" to regain lost performance

Message ID 20210126195907.2273494-1-maxtram95@gmail.com (mailing list archive)
State New, archived
Headers show
Series Revert "block: simplify set_init_blocksize" to regain lost performance | expand

Commit Message

Maxim Mikityanskiy Jan. 26, 2021, 7:59 p.m. UTC
The cited commit introduced a serious regression with SATA write speed,
as found by bisecting. This patch reverts this commit, which restores
write speed back to the values observed before this commit.

The performance tests were done on a Helios4 NAS (2nd batch) with 4 HDDs
(WD8003FFBX) using dd (bs=1M count=2000). "Direct" is a test with a
single HDD, the rest are different RAID levels built over the first
partitions of 4 HDDs. Test results are in MB/s, R is read, W is write.

                | Direct | RAID0 | RAID10 f2 | RAID10 n2 | RAID6
----------------+--------+-------+-----------+-----------+--------
9011495c9466    | R:256  | R:313 | R:276     | R:313     | R:323
(before faulty) | W:254  | W:253 | W:195     | W:204     | W:117
----------------+--------+-------+-----------+-----------+--------
5ff9f19231a0    | R:257  | R:398 | R:312     | R:344     | R:391
(faulty commit) | W:154  | W:122 | W:67.7    | W:66.6    | W:67.2
----------------+--------+-------+-----------+-----------+--------
5.10.10         | R:256  | R:401 | R:312     | R:356     | R:375
unpatched       | W:149  | W:123 | W:64      | W:64.1    | W:61.5
----------------+--------+-------+-----------+-----------+--------
5.10.10         | R:255  | R:396 | R:312     | R:340     | R:393
patched         | W:247  | W:274 | W:220     | W:225     | W:121

Applying this patch doesn't hurt read performance, while improves the
write speed by 1.5x - 3.5x (more impact on RAID tests). The write speed
is restored back to the state before the faulty commit, and even a bit
higher in RAID tests (which aren't HDD-bound on this device) - that is
likely related to other optimizations done between the faulty commit and
5.10.10 which also improved the read speed.

Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Fixes: 5ff9f19231a0 ("block: simplify set_init_blocksize")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
---
 fs/block_dev.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Ming Lei Jan. 27, 2021, 3:18 p.m. UTC | #1
On Wed, Jan 27, 2021 at 09:44:50AM +0200, Maxim Mikityanskiy wrote:
> On Wed, Jan 27, 2021 at 6:23 AM Bart Van Assche <bvanassche@acm.org> wrote:
> >
> > On 1/26/21 11:59 AM, Maxim Mikityanskiy wrote:
> > > The cited commit introduced a serious regression with SATA write speed,
> > > as found by bisecting. This patch reverts this commit, which restores
> > > write speed back to the values observed before this commit.
> > >
> > > The performance tests were done on a Helios4 NAS (2nd batch) with 4 HDDs
> > > (WD8003FFBX) using dd (bs=1M count=2000). "Direct" is a test with a
> > > single HDD, the rest are different RAID levels built over the first
> > > partitions of 4 HDDs. Test results are in MB/s, R is read, W is write.
> > >
> > >                 | Direct | RAID0 | RAID10 f2 | RAID10 n2 | RAID6
> > > ----------------+--------+-------+-----------+-----------+--------
> > > 9011495c9466    | R:256  | R:313 | R:276     | R:313     | R:323
> > > (before faulty) | W:254  | W:253 | W:195     | W:204     | W:117
> > > ----------------+--------+-------+-----------+-----------+--------
> > > 5ff9f19231a0    | R:257  | R:398 | R:312     | R:344     | R:391
> > > (faulty commit) | W:154  | W:122 | W:67.7    | W:66.6    | W:67.2
> > > ----------------+--------+-------+-----------+-----------+--------
> > > 5.10.10         | R:256  | R:401 | R:312     | R:356     | R:375
> > > unpatched       | W:149  | W:123 | W:64      | W:64.1    | W:61.5
> > > ----------------+--------+-------+-----------+-----------+--------
> > > 5.10.10         | R:255  | R:396 | R:312     | R:340     | R:393
> > > patched         | W:247  | W:274 | W:220     | W:225     | W:121
> > >
> > > Applying this patch doesn't hurt read performance, while improves the
> > > write speed by 1.5x - 3.5x (more impact on RAID tests). The write speed
> > > is restored back to the state before the faulty commit, and even a bit
> > > higher in RAID tests (which aren't HDD-bound on this device) - that is
> > > likely related to other optimizations done between the faulty commit and
> > > 5.10.10 which also improved the read speed.
> > >
> > > Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
> > > Fixes: 5ff9f19231a0 ("block: simplify set_init_blocksize")
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Cc: Jens Axboe <axboe@kernel.dk>
> > > ---
> > >  fs/block_dev.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > > index 3b8963e228a1..235b5042672e 100644
> > > --- a/fs/block_dev.c
> > > +++ b/fs/block_dev.c
> > > @@ -130,7 +130,15 @@ EXPORT_SYMBOL(truncate_bdev_range);
> > >
> > >  static void set_init_blocksize(struct block_device *bdev)
> > >  {
> > > -     bdev->bd_inode->i_blkbits = blksize_bits(bdev_logical_block_size(bdev));
> > > +     unsigned int bsize = bdev_logical_block_size(bdev);
> > > +     loff_t size = i_size_read(bdev->bd_inode);
> > > +
> > > +     while (bsize < PAGE_SIZE) {
> > > +             if (size & bsize)
> > > +                     break;
> > > +             bsize <<= 1;
> > > +     }
> > > +     bdev->bd_inode->i_blkbits = blksize_bits(bsize);
> > >  }
> > >
> > >  int set_blocksize(struct block_device *bdev, int size)
> >
> > How can this patch affect write speed? I haven't found any calls of
> > set_init_blocksize() in the I/O path. Did I perhaps overlook something?
> 
> I don't know the exact mechanism how this change affects the speed,
> I'm not an expert in the block device subsystem (I'm a networking
> guy). This commit was found by git bisect, and my performance test
> confirmed that reverting it fixes the bug.
> 
> It looks to me as this function sets the block size as part of control
> flow, and this size is used later in the fast path, and the commit
> that removed the loop decreased this block size.

Right, the issue is stupid __block_write_full_page() which submits single bio
for each buffer head. And I have tried to improve the situation by merging
BHs into single bio, see below patch:

	https://lore.kernel.org/linux-block/20201230000815.3448707-1-ming.lei@redhat.com/

The above patch should improve perf for your test case.
Christoph Hellwig Jan. 27, 2021, 4:12 p.m. UTC | #2
While this code is gross, I think we need to add it back for now:

Acked-by: Christoph Hellwig <hch@lst.de>

I'll put converting the block device buffered I/O path to iomap or
an iomap lookalike on the backburner to fix this..
Jens Axboe Jan. 27, 2021, 4:15 p.m. UTC | #3
On 1/26/21 12:59 PM, Maxim Mikityanskiy wrote:
> The cited commit introduced a serious regression with SATA write speed,
> as found by bisecting. This patch reverts this commit, which restores
> write speed back to the values observed before this commit.
> 
> The performance tests were done on a Helios4 NAS (2nd batch) with 4 HDDs
> (WD8003FFBX) using dd (bs=1M count=2000). "Direct" is a test with a
> single HDD, the rest are different RAID levels built over the first
> partitions of 4 HDDs. Test results are in MB/s, R is read, W is write.
> 
>                 | Direct | RAID0 | RAID10 f2 | RAID10 n2 | RAID6
> ----------------+--------+-------+-----------+-----------+--------
> 9011495c9466    | R:256  | R:313 | R:276     | R:313     | R:323
> (before faulty) | W:254  | W:253 | W:195     | W:204     | W:117
> ----------------+--------+-------+-----------+-----------+--------
> 5ff9f19231a0    | R:257  | R:398 | R:312     | R:344     | R:391
> (faulty commit) | W:154  | W:122 | W:67.7    | W:66.6    | W:67.2
> ----------------+--------+-------+-----------+-----------+--------
> 5.10.10         | R:256  | R:401 | R:312     | R:356     | R:375
> unpatched       | W:149  | W:123 | W:64      | W:64.1    | W:61.5
> ----------------+--------+-------+-----------+-----------+--------
> 5.10.10         | R:255  | R:396 | R:312     | R:340     | R:393
> patched         | W:247  | W:274 | W:220     | W:225     | W:121
> 
> Applying this patch doesn't hurt read performance, while improves the
> write speed by 1.5x - 3.5x (more impact on RAID tests). The write speed
> is restored back to the state before the faulty commit, and even a bit
> higher in RAID tests (which aren't HDD-bound on this device) - that is
> likely related to other optimizations done between the faulty commit and
> 5.10.10 which also improved the read speed.

Can't argue with these numbers, and while this should probably get
fixed up instead, let's leave that for future kernels. I'll apply this
for 5.11, thanks.
diff mbox series

Patch

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 3b8963e228a1..235b5042672e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -130,7 +130,15 @@  EXPORT_SYMBOL(truncate_bdev_range);
 
 static void set_init_blocksize(struct block_device *bdev)
 {
-	bdev->bd_inode->i_blkbits = blksize_bits(bdev_logical_block_size(bdev));
+	unsigned int bsize = bdev_logical_block_size(bdev);
+	loff_t size = i_size_read(bdev->bd_inode);
+
+	while (bsize < PAGE_SIZE) {
+		if (size & bsize)
+			break;
+		bsize <<= 1;
+	}
+	bdev->bd_inode->i_blkbits = blksize_bits(bsize);
 }
 
 int set_blocksize(struct block_device *bdev, int size)