mbox series

[0/5] Improve zram writeback performance

Message ID 20230911133430.1824564-1-kernel@pankajraghav.com (mailing list archive)
Headers show
Series Improve zram writeback performance | expand

Message

Pankaj Raghav (Samsung) Sept. 11, 2023, 1:34 p.m. UTC
ZRAM can have a backing device that could be used as a writeback device
for the pages in RAM. The current writeback code (writeback_store()) does
a synchronous single page size IO to the backing device.

This series implements IO batching while doing a writeback to a backing
device. The code still does synchronous IOs but with larger IO sizes
whenever possible. This crosses off one of the TODO that was there as a part
of writeback_store() function:
A single page IO would be inefficient for write...

The idea is to batch the IOs to a certain limit before the data is flushed
to the backing device. The batch limit is initially chosen based on the
bdi->io_pages value with an upper limit of 32 pages (128k on x86).

Batching reduces the time of writeback of 4G data to a nvme backing device
from 68 secs to 15 secs (more than **4x improvement**).

The first 3 patches are prep. 4th patch implements the main logic for IO
batching and the last patch is another cleanup.

Perf:

$ modprobe zram num_devices=1
$ echo "/dev/nvme0n1" > /sys/block/zram0/backing_dev
$ echo 6G > /sys/block/zram0/disksize
$ fio -iodepth=16 -rw=randwrite -ioengine=io_uring -bs=4k -numjobs=1 -size=4G -filename=/dev/zram0 -name=io_uring_1 > /dev/null
$ echo all > /sys/block/zram0/idle

Without changes:
$ time echo idle > /sys/block/zram0/writeback
real    1m8.648s         (68 secs)
user    0m0.000s
sys     0m24.899s
$ cat /sys/block/zram0/bd_stat
1048576        0  1048576

With changes:
$ time echo idle > /sys/block/zram0/writeback
real    0m15.496s       (15 secs)
user    0m0.000s
sys     0m7.789s
$ cat /sys/block/zram0/bd_stat
1048576        0  1048576

Testing:

A basic End-End testing (based on Sergey's test flow [1]):
1) configure zram0 and add a nvme device as a writeback device
2) Get the sha256sum of a tarball
3) mkfs.ext4 on zram0, cp tarball
4) idle writeback
5) cp tarball from zram0 to another device (reread writeback pages) and
   compare the sha256sum again
The sha before and after are verified to be the same.

Writeback limit testing:

1) configure zram0 and add a nvme device as a writeback device
2) Set writeback limit and enable
3) Do a fio that crosses the writeback limit
4) idle writeback
5) Verify the writeback is limited to the set writeback limit value

$ modprobe zram num_devices=1
$ echo "/dev/nvme0n1" > /sys/block/zram0/backing_dev
$ echo 4G > /sys/block/zram0/disksize
$ echo 1 > /sys/block/zram0/writeback_limit_enable
$ echo 1002 > /sys/block/zram0/writeback_limit

$ fio -iodepth=16 -rw=write -ioengine=io_uring -bs=4k -numjobs=1 -size=10M -filename=/dev/zram0 -name=io_uring_1

$ echo all > /sys/block/zram0/idle
$ echo idle > /sys/block/zram0/writeback
$ cat /sys/block/zram0/bd_stat
1002        0     1002

writeback is limited to the set value.

[1] https://lore.kernel.org/lkml/20230806071601.GB907732@google.com/

Pankaj Raghav (5):
  zram: move index preparation to a separate function in writeback_store
  zram: encapsulate writeback to the backing bdev in a function
  zram: add alloc_block_bdev_range() and free_block_bdev_range()
  zram: batch IOs during writeback to improve performance
  zram: don't overload blk_idx variable in writeback_store()

 drivers/block/zram/zram_drv.c | 318 ++++++++++++++++++++++------------
 1 file changed, 210 insertions(+), 108 deletions(-)


base-commit: 7bc675554773f09d88101bf1ccfc8537dc7c0be9

Comments

Pankaj Raghav Sept. 18, 2023, 1:53 p.m. UTC | #1
Gentle ping Minchan and Sergey.

Regards,
Pankaj

On 2023-09-11 15:34, Pankaj Raghav wrote:
> ZRAM can have a backing device that could be used as a writeback device
> for the pages in RAM. The current writeback code (writeback_store()) does
> a synchronous single page size IO to the backing device.
> 
> This series implements IO batching while doing a writeback to a backing
> device. The code still does synchronous IOs but with larger IO sizes
> whenever possible. This crosses off one of the TODO that was there as a part
> of writeback_store() function:
> A single page IO would be inefficient for write...
> 
> The idea is to batch the IOs to a certain limit before the data is flushed
> to the backing device. The batch limit is initially chosen based on the
> bdi->io_pages value with an upper limit of 32 pages (128k on x86).
> 
> Batching reduces the time of writeback of 4G data to a nvme backing device
> from 68 secs to 15 secs (more than **4x improvement**).
> 
> The first 3 patches are prep. 4th patch implements the main logic for IO
> batching and the last patch is another cleanup.
> 
> Perf:
> 
> $ modprobe zram num_devices=1
> $ echo "/dev/nvme0n1" > /sys/block/zram0/backing_dev
> $ echo 6G > /sys/block/zram0/disksize
> $ fio -iodepth=16 -rw=randwrite -ioengine=io_uring -bs=4k -numjobs=1 -size=4G -filename=/dev/zram0 -name=io_uring_1 > /dev/null
> $ echo all > /sys/block/zram0/idle
> 
> Without changes:
> $ time echo idle > /sys/block/zram0/writeback
> real    1m8.648s         (68 secs)
> user    0m0.000s
> sys     0m24.899s
> $ cat /sys/block/zram0/bd_stat
> 1048576        0  1048576
> 
> With changes:
> $ time echo idle > /sys/block/zram0/writeback
> real    0m15.496s       (15 secs)
> user    0m0.000s
> sys     0m7.789s
> $ cat /sys/block/zram0/bd_stat
> 1048576        0  1048576
> 
> Testing:
> 
> A basic End-End testing (based on Sergey's test flow [1]):
> 1) configure zram0 and add a nvme device as a writeback device
> 2) Get the sha256sum of a tarball
> 3) mkfs.ext4 on zram0, cp tarball
> 4) idle writeback
> 5) cp tarball from zram0 to another device (reread writeback pages) and
>    compare the sha256sum again
> The sha before and after are verified to be the same.
> 
> Writeback limit testing:
> 
> 1) configure zram0 and add a nvme device as a writeback device
> 2) Set writeback limit and enable
> 3) Do a fio that crosses the writeback limit
> 4) idle writeback
> 5) Verify the writeback is limited to the set writeback limit value
> 
> $ modprobe zram num_devices=1
> $ echo "/dev/nvme0n1" > /sys/block/zram0/backing_dev
> $ echo 4G > /sys/block/zram0/disksize
> $ echo 1 > /sys/block/zram0/writeback_limit_enable
> $ echo 1002 > /sys/block/zram0/writeback_limit
> 
> $ fio -iodepth=16 -rw=write -ioengine=io_uring -bs=4k -numjobs=1 -size=10M -filename=/dev/zram0 -name=io_uring_1
> 
> $ echo all > /sys/block/zram0/idle
> $ echo idle > /sys/block/zram0/writeback
> $ cat /sys/block/zram0/bd_stat
> 1002        0     1002
> 
> writeback is limited to the set value.
> 
> [1] https://lore.kernel.org/lkml/20230806071601.GB907732@google.com/
> 
> Pankaj Raghav (5):
>   zram: move index preparation to a separate function in writeback_store
>   zram: encapsulate writeback to the backing bdev in a function
>   zram: add alloc_block_bdev_range() and free_block_bdev_range()
>   zram: batch IOs during writeback to improve performance
>   zram: don't overload blk_idx variable in writeback_store()
> 
>  drivers/block/zram/zram_drv.c | 318 ++++++++++++++++++++++------------
>  1 file changed, 210 insertions(+), 108 deletions(-)
> 
> 
> base-commit: 7bc675554773f09d88101bf1ccfc8537dc7c0be9
Sergey Senozhatsky Sept. 19, 2023, 12:33 a.m. UTC | #2
On (23/09/18 15:53), Pankaj Raghav wrote:
> Gentle ping Minchan and Sergey.

Hello,

zram writeback is currently under (heavy) rework, the series hasn't
been published yet, but it's in the making for some time already.
The biggest change is that zram will support compressed writeback,
that is writeback of compressed objects, as opposed to current design
when zram de-compresses pages before writeback.

Minchan will have more details, but I guess we'll need to wait for
that series to land.
Pankaj Raghav Sept. 19, 2023, 2:20 p.m. UTC | #3
On 2023-09-19 02:33, Sergey Senozhatsky wrote:
> On (23/09/18 15:53), Pankaj Raghav wrote:
>> Gentle ping Minchan and Sergey.
> 
> Hello,
> 
> zram writeback is currently under (heavy) rework, the series hasn't
> been published yet, but it's in the making for some time already.
> The biggest change is that zram will support compressed writeback,
> that is writeback of compressed objects, as opposed to current design
> when zram de-compresses pages before writeback.
> 
Got it. Thanks for the explanation. The compressed writeback also makes
sense as it will save space in the backing device.

> Minchan will have more details, but I guess we'll need to wait for
> that series to land.

This series might not be applicable with the new direction, but anyway I will
keep an eye on the new series from Minchan.
Jassi Brar Sept. 25, 2024, 3:53 p.m. UTC | #4
Hi Sergey, Hi Minchan,

>> Gentle ping Minchan and Sergey.
>
>Hello,
>
>zram writeback is currently under (heavy) rework, the series hasn't
>been published yet, but it's in the making for some time already.
>The biggest change is that zram will support compressed writeback,
>that is writeback of compressed objects, as opposed to current design
>when zram de-compresses pages before writeback.
>
>Minchan will have more details, but I guess we'll need to wait for
>that series to land.

May I please know where we are with the rework? Is there somewhere I
could look up the compressed-writeback work-in-progress code?

Regards,
Jassi
Sergey Senozhatsky Sept. 26, 2024, 4:33 a.m. UTC | #5
On (24/09/25 10:53), Jassi Brar wrote:
> Hi Sergey, Hi Minchan,
> 
> >> Gentle ping Minchan and Sergey.
> >
> May I please know where we are with the rework? Is there somewhere I
> could look up the compressed-writeback work-in-progress code?

There is no code for that nor any progress that can be shared,
as far as I'm concerned.

The most recent writeback-related patch series (WIP) reworks
how writeback and re-compression select target slots for
post-processing [1]

[1] https://lore.kernel.org/linux-kernel/20240917021020.883356-1-senozhatsky@chromium.org
Sergey Senozhatsky Sept. 26, 2024, 4:41 a.m. UTC | #6
On (23/09/11 15:34), Pankaj Raghav wrote:
> Batching reduces the time of writeback of 4G data to a nvme backing device
> from 68 secs to 15 secs (more than **4x improvement**).

I don't think anyone does that on practice.  Excessive writeback wears
out flash storage, so on practice no one writebacks gigabytes of data
all at once, but instead people put daily writeback limits and try to
be flash storage "friendly", which is especially important if your device
has to a lifespan of 7 or 10 years.  IOW usually writeback is put under
such constraints that writeback speed is hardly noticeable.  So I'm not
sure that the complexity that this patch introduces is justified, to be
honest.
Jassi Brar Sept. 29, 2024, 10:21 p.m. UTC | #7
On Wed, Sep 25, 2024 at 11:34 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/09/25 10:53), Jassi Brar wrote:
> > Hi Sergey, Hi Minchan,
> >
> > >> Gentle ping Minchan and Sergey.
> > >
> > May I please know where we are with the rework? Is there somewhere I
> > could look up the compressed-writeback work-in-progress code?
>
> There is no code for that nor any progress that can be shared,
> as far as I'm concerned.
>
> The most recent writeback-related patch series (WIP) reworks
> how writeback and re-compression select target slots for
> post-processing [1]
>
Thanks for the update, Sergey,

Minchan, if you are not pursuing that patchset anymore, is it possible
to share the last version you had? That could avoid re-writing from
scratch and, more importantly, straying too far from your preferred
implementation.

Thanks
Jassi