mbox series

[V2,0/1] brd: use memcpy_to_page() in copy_to_brd()

Message ID 20230410201938.59122-1-kch@nvidia.com (mailing list archive)
Headers show
Series brd: use memcpy_to_page() in copy_to_brd() | expand

Message

Chaitanya Kulkarni April 10, 2023, 8:19 p.m. UTC
From :include/linux/highmem.h:
"kmap_atomic - Atomically map a page for temporary usage - Deprecated!"

Use memcpy_from_page() since does the same job of mapping, copying, and
unmaping except it uses non deprecated kmap_local_page() and
kunmap_local(). Following are the differences between kmal_local_page()
and kmap_atomic() :-

* creates local mapping per thread, local to CPU & not globally visible
* allows to be called from any context
* allows task preemption 

Performance numbers from V1 should apply as it as there is not even
a single line of change in this version which only combines
all the patches into one :-

There is a slight performance difference observed with the use of new
API on the one arch I've tested with two different sets :-

Set 1 (Average of 3 runs) :-
-----------------------------
* Latency (lower is better)   :- ~14 higher with this patch seires
* IOPS/BW (higner is better)  :- ~47k higner with this patch series
* CPU Usage (lower is better) :- approximately the same 

Set 2 (Average of 3 runs) :-
-----------------------------
* Latency (lower is better)   :- ~9 higher with this patch seires
* IOPS/BW (higner is better)  :- ~23k higner with this patch series
* CPU Usage (lower is better) :- approximately the same 

Below is the test for the fio verification job and perf numbers on brd.

In case someone shows up with performance regression on the arch that
I've don't have access to we can decide then if we want to drop it this
or keep using deprecated kernel API, but I think removing deprecated
API is useful in long term in anyway.

-ck

v2:-

Merge all the patches into a single patch.
No functional change from V1.

Chaitanya Kulkarni (1):
  brd: use memcpy_to|from_page() in copy_to|from_brd()

 drivers/block/brd.c | 26 ++++++++------------------
 1 file changed, 8 insertions(+), 18 deletions(-)


fio verify job output:

linux-block (brd-memcpy) # git log -1
commit ea45fcc44031dc56055b194f0792fb2230caba00 (HEAD -> brd-memcpy)
Author: Chaitanya Kulkarni <kch@nvidia.com>
Date:   Sun Apr 9 15:14:01 2023 -0700

    brd: use memcpy_xxx_page() lib functions

    "kmap_atomic - Atomically map a page for temporary usage - Deprecated!"

    Use memcpy_from_page() helper that does same job of mapping and copying
    buffer that is opcoded in copy_from_brd() except the library function
    also uses non deprecated kmap_local_page() and kunmap_local() instead
    of kmap() amd kunmap() in current code.

    Use memcpy_to_page() helper that does same job of mapping and copying
    buffer that is opcoded in copy_to_brd() except the library function
    also uses non deprecated kmap_local_page() and kunmap_local() instead
    of kmap() amd kunmap() in current code.

    Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
linux-block (brd-memcpy) # ./compile_brd.sh 
+ umount /mnt/brd
umount: /mnt/brd: not mounted.
+ dmesg -c
+ modprobe -r brd
+ lsmod
+ grep brd
++ nproc
+ make -j 48 M=drivers/block modules
  CC [M]  drivers/block/floppy.o
  CC [M]  drivers/block/brd.o
  CC [M]  drivers/block/loop.o
  CC [M]  drivers/block/nbd.o
  CC [M]  drivers/block/virtio_blk.o
  CC [M]  drivers/block/xen-blkfront.o
  CC [M]  drivers/block/rbd.o
  CC [M]  drivers/block/mtip32xx/mtip32xx.o
  CC [M]  drivers/block/xen-blkback/blkback.o
  CC [M]  drivers/block/zram/zram_drv.o
  CC [M]  drivers/block/xen-blkback/xenbus.o
  CC [M]  drivers/block/null_blk/main.o
  CC [M]  drivers/block/null_blk/trace.o
  CC [M]  drivers/block/null_blk/zoned.o
  CC [M]  drivers/block/drbd/drbd_bitmap.o
  CC [M]  drivers/block/drbd/drbd_proc.o
  CC [M]  drivers/block/drbd/drbd_worker.o
  CC [M]  drivers/block/drbd/drbd_receiver.o
  CC [M]  drivers/block/drbd/drbd_req.o
  CC [M]  drivers/block/drbd/drbd_actlog.o
  CC [M]  drivers/block/drbd/drbd_main.o
  CC [M]  drivers/block/drbd/drbd_nl.o
  CC [M]  drivers/block/drbd/drbd_state.o
  CC [M]  drivers/block/drbd/drbd_nla.o
  CC [M]  drivers/block/drbd/drbd_debugfs.o
  LD [M]  drivers/block/zram/zram.o
  LD [M]  drivers/block/xen-blkback/xen-blkback.o
  LD [M]  drivers/block/null_blk/null_blk.o
  LD [M]  drivers/block/drbd/drbd.o
  MODPOST drivers/block/Module.symvers
  LD [M]  drivers/block/floppy.ko
  LD [M]  drivers/block/brd.ko
  LD [M]  drivers/block/loop.ko
  LD [M]  drivers/block/nbd.ko
  LD [M]  drivers/block/virtio_blk.ko
  LD [M]  drivers/block/xen-blkfront.ko
  LD [M]  drivers/block/xen-blkback/xen-blkback.ko
  LD [M]  drivers/block/drbd/drbd.ko
  LD [M]  drivers/block/rbd.ko
  LD [M]  drivers/block/mtip32xx/mtip32xx.ko
  LD [M]  drivers/block/zram/zram.ko
  LD [M]  drivers/block/null_blk/null_blk.ko
+ HOST=drivers/block/brd.ko
++ uname -r
+ HOST_DEST=/lib/modules/6.3.0-rc5lblk+/kernel/drivers/block/
+ cp drivers/block/brd.ko /lib/modules/6.3.0-rc5lblk+/kernel/drivers/block//
+ ls -lrth /lib/modules/6.3.0-rc5lblk+/kernel/drivers/block//brd.ko
-rw-r--r--. 1 root root 375K Apr 10 13:17 /lib/modules/6.3.0-rc5lblk+/kernel/drivers/block//brd.ko
+ dmesg -c
[81687.581471] brd: module unloaded
+ lsmod
+ grep brd
linux-block (brd-memcpy) # modprobe brd rd_size=$((70*1024*1204)) rd_nr=1; ls /dev/ram0
/dev/ram0
linux-block (brd-memcpy) # cat fio/verify.fio
[write-and-verify]
rw=randwrite
bs=4k
direct=1
ioengine=libaio
iodepth=16
norandommap
randrepeat=0
verify=crc32c
size=15G
allow_file_create=0
group_reporting
linux-block (brd-memcpy) # fio --filename= /dev/ram0
fio: option filename requires an argument
linux-block (brd-memcpy) # fio fio/verify.fio  --filename=/dev/ram0
write-and-verify: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
fio-3.27
Starting 1 process
Jobs: 1 (f=0): [f(1)][100.0%][r=1058MiB/s][r=271k IOPS][eta 00m:00s]
write-and-verify: (groupid=0, jobs=1): err= 0: pid=57965: Mon Apr 10 13:16:54 2023
  read: IOPS=390k, BW=1522MiB/s (1596MB/s)(9710MiB/6381msec)
    slat (nsec): min=1152, max=70725, avg=1481.93, stdev=397.01
    clat (nsec): min=1092, max=224095, avg=38774.90, stdev=2190.45
     lat (usec): min=2, max=225, avg=40.30, stdev= 2.23
    clat percentiles (nsec):
     |  1.00th=[37120],  5.00th=[37120], 10.00th=[37632], 20.00th=[37632],
     | 30.00th=[38144], 40.00th=[38144], 50.00th=[38144], 60.00th=[38656],
     | 70.00th=[38656], 80.00th=[39168], 90.00th=[39680], 95.00th=[43264],
     | 99.00th=[47360], 99.50th=[49920], 99.90th=[52480], 99.95th=[54528],
     | 99.99th=[85504]
  write: IOPS=162k, BW=634MiB/s (665MB/s)(15.0GiB/24209msec); 0 zone resets
    slat (usec): min=2, max=744, avg= 5.54, stdev= 2.45
    clat (nsec): min=1002, max=843151, avg=92648.20, stdev=18028.23
     lat (usec): min=5, max=848, avg=98.24, stdev=19.04
    clat percentiles (usec):
     |  1.00th=[   64],  5.00th=[   72], 10.00th=[   76], 20.00th=[   79],
     | 30.00th=[   82], 40.00th=[   85], 50.00th=[   90], 60.00th=[   93],
     | 70.00th=[   97], 80.00th=[  106], 90.00th=[  120], 95.00th=[  129],
     | 99.00th=[  147], 99.50th=[  155], 99.90th=[  176], 99.95th=[  186],
     | 99.99th=[  206]
   bw (  KiB/s): min=222288, max=761392, per=98.81%, avg=641985.14, stdev=90922.65, samples=49
   iops        : min=55572, max=190348, avg=160496.33, stdev=22730.68, samples=49
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=38.57%
  lat (usec)   : 100=46.17%, 250=15.26%, 500=0.01%, 1000=0.01%
  cpu          : usr=48.58%, sys=51.34%, ctx=17, majf=0, minf=58280
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=2485856,3932160,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=1522MiB/s (1596MB/s), 1522MiB/s-1522MiB/s (1596MB/s-1596MB/s), io=9710MiB (10.2GB), run=6381-6381msec
  WRITE: bw=634MiB/s (665MB/s), 634MiB/s-634MiB/s (665MB/s-665MB/s), io=15.0GiB (16.1GB), run=24209-24209msec

Disk stats (read/write):
  ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
linux-block (brd-memcpy) #