Message ID | 20161005020153.GA2988@bbox (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hello Minchan, On (10/05/16 11:01), Minchan Kim wrote: [..] > 1. just changed ordering of test execution - hope to reduce testing time due to > block population before the first reading or reading just zero pages > 2. used sync_on_close instead of direct io > 3. Don't use perf to avoid noise > 4. echo 0 > /sys/block/zram0/use_aio to test synchronous IO for old behavior ok, will use it in the tests below. > 1. ZRAM_SIZE=3G ZRAM_COMP_ALG=lzo LOG_SUFFIX=async FIO_LOOPS=2 MAX_ITER=1 ./zram-fio-test.sh > 2. modify script to disable aio via /sys/block/zram0/use_aio > ZRAM_SIZE=3G ZRAM_COMP_ALG=lzo LOG_SUFFIX=sync FIO_LOOPS=2 MAX_ITER=1 ./zram-fio-test.sh > > seq-write 380930 474325 124.52% > rand-write 286183 357469 124.91% > seq-read 266813 265731 99.59% > rand-read 211747 210670 99.49% > mixed-seq(R) 145750 171232 117.48% > mixed-seq(W) 145736 171215 117.48% > mixed-rand(R) 115355 125239 108.57% > mixed-rand(W) 115371 125256 108.57% no_aio use_aio WRITE: 1432.9MB/s 1511.5MB/s WRITE: 1173.9MB/s 1186.9MB/s READ: 912699KB/s 912170KB/s WRITE: 912497KB/s 911968KB/s READ: 725658KB/s 726747KB/s READ: 579003KB/s 594543KB/s READ: 373276KB/s 373719KB/s WRITE: 373572KB/s 374016KB/s seconds elapsed 45.399702511 44.280199716 > LZO compression is fast and a CPU for queueing while 3 CPU for compressing > it cannot saturate CPU full bandwidth. Nonetheless, it shows 24% enhancement. > It could be more in slow CPU like embedded. > > I tested it with deflate. The result is 300% enhancement. > > seq-write 33598 109882 327.05% > rand-write 32815 102293 311.73% > seq-read 154323 153765 99.64% > rand-read 129978 129241 99.43% > mixed-seq(R) 15887 44995 283.22% > mixed-seq(W) 15885 44990 283.22% > mixed-rand(R) 25074 55491 221.31% > mixed-rand(W) 25078 55499 221.31% > > So, curious with your test. > Am my test sync with yours? If you cannot see enhancment in job1, could > you test with deflate? It seems your CPU is really fast. interesting observation. no_aio use_aio WRITE: 47882KB/s 158931KB/s WRITE: 47714KB/s 156484KB/s READ: 42914KB/s 137997KB/s WRITE: 42904KB/s 137967KB/s READ: 333764KB/s 332828KB/s READ: 293883KB/s 294709KB/s READ: 51243KB/s 129701KB/s WRITE: 51284KB/s 129804KB/s seconds elapsed 480.869169882 181.678431855 yes, looks like with lzo CPU manages to process bdi writeback fast enough to keep fio-template-static-buffer worker active. to prove this theory: direct=1 cures zram-deflate. no_aio use_aio WRITE: 41873KB/s 34257KB/s WRITE: 41455KB/s 34087KB/s READ: 36705KB/s 28960KB/s WRITE: 36697KB/s 28954KB/s READ: 327902KB/s 327270KB/s READ: 316217KB/s 316886KB/s READ: 35980KB/s 28131KB/s WRITE: 36008KB/s 28153KB/s seconds elapsed 515.575252170 629.114626795 as soon as wb flush kworker can't keep up anymore things are going off the rails. most of the time, fio-template-static-buffer are in D state, while the biggest bdi flush kworker is doing the job (a lot of job): PID USER PR NI VIRT RES %CPU %MEM TIME+ S COMMAND 6274 root 20 0 0.0m 0.0m 100.0 0.0 1:15.60 R [kworker/u8:1] 11169 root 20 0 718.1m 1.6m 16.6 0.0 0:01.88 D fio ././conf/fio-template-static-buffer 11171 root 20 0 718.1m 1.6m 3.3 0.0 0:01.15 D fio ././conf/fio-template-static-buffer 11170 root 20 0 718.1m 3.3m 2.6 0.1 0:00.98 D fio ././conf/fio-template-static-buffer and still working... 6274 root 20 0 0.0m 0.0m 100.0 0.0 3:05.49 R [kworker/u8:1] 12048 root 20 0 718.1m 1.6m 16.7 0.0 0:01.80 R fio ././conf/fio-template-static-buffer 12047 root 20 0 718.1m 1.6m 3.3 0.0 0:01.12 D fio ././conf/fio-template-static-buffer 12049 root 20 0 718.1m 1.6m 3.3 0.0 0:01.12 D fio ././conf/fio-template-static-buffer 12050 root 20 0 718.1m 1.6m 2.0 0.0 0:00.98 D fio ././conf/fio-template-static-buffer and working... [ 4159.338731] CPU: 0 PID: 105 Comm: kworker/u8:4 [ 4159.338734] Workqueue: writeback wb_workfn (flush-254:0) [ 4159.338746] [<ffffffffa01d8cff>] zram_make_request+0x4a3/0x67b [zram] [ 4159.338748] [<ffffffff810543fe>] ? try_to_wake_up+0x201/0x213 [ 4159.338750] [<ffffffff810ae9d3>] ? mempool_alloc+0x5e/0x124 [ 4159.338752] [<ffffffff811a9922>] generic_make_request+0xb8/0x156 [ 4159.338753] [<ffffffff811a9aaf>] submit_bio+0xef/0xf8 [ 4159.338755] [<ffffffff81121a97>] submit_bh_wbc.isra.10+0x16b/0x178 [ 4159.338757] [<ffffffff811223ec>] __block_write_full_page+0x1b2/0x2a6 [ 4159.338758] [<ffffffff8112403e>] ? bh_submit_read+0x5a/0x5a [ 4159.338760] [<ffffffff81120f9a>] ? end_buffer_write_sync+0x36/0x36 [ 4159.338761] [<ffffffff8112403e>] ? bh_submit_read+0x5a/0x5a [ 4159.338763] [<ffffffff811226d8>] block_write_full_page+0xf6/0xff [ 4159.338765] [<ffffffff81124342>] blkdev_writepage+0x13/0x15 [ 4159.338767] [<ffffffff810b498c>] __writepage+0xe/0x26 [ 4159.338768] [<ffffffff810b65aa>] write_cache_pages+0x28c/0x376 [ 4159.338770] [<ffffffff810b497e>] ? __wb_calc_thresh+0x83/0x83 [ 4159.338772] [<ffffffff810b66dc>] generic_writepages+0x48/0x67 [ 4159.338773] [<ffffffff81124318>] blkdev_writepages+0x9/0xb [ 4159.338775] [<ffffffff81124318>] ? blkdev_writepages+0x9/0xb [ 4159.338776] [<ffffffff810b6716>] do_writepages+0x1b/0x24 [ 4159.338778] [<ffffffff8111b12c>] __writeback_single_inode+0x3d/0x155 [ 4159.338779] [<ffffffff8111b407>] writeback_sb_inodes+0x1c3/0x32c [ 4159.338781] [<ffffffff8111b5e1>] __writeback_inodes_wb+0x71/0xa9 [ 4159.338783] [<ffffffff8111b7ce>] wb_writeback+0x10f/0x1a1 [ 4159.338785] [<ffffffff8111be32>] wb_workfn+0x1c9/0x24c [ 4159.338786] [<ffffffff8111be32>] ? wb_workfn+0x1c9/0x24c [ 4159.338788] [<ffffffff8104a2e2>] process_one_work+0x1a4/0x2a7 [ 4159.338790] [<ffffffff8104ae32>] worker_thread+0x23b/0x37c [ 4159.338792] [<ffffffff8104abf7>] ? rescuer_thread+0x2eb/0x2eb [ 4159.338793] [<ffffffff8104f285>] kthread+0xce/0xd6 [ 4159.338794] [<ffffffff8104f1b7>] ? kthread_create_on_node+0x1ad/0x1ad [ 4159.338796] [<ffffffff8145ad12>] ret_from_fork+0x22/0x30 so the question is -- can we move this parallelization out of zram and instead flush bdi in more than one kthread? how bad that would be? can anyone else benefit from this? [1] https://lwn.net/Articles/353844/ [2] https://lwn.net/Articles/354852/ -ss -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sergey, On Thu, Oct 06, 2016 at 05:29:15PM +0900, Sergey Senozhatsky wrote: > Hello Minchan, > > On (10/05/16 11:01), Minchan Kim wrote: > [..] > > 1. just changed ordering of test execution - hope to reduce testing time due to > > block population before the first reading or reading just zero pages > > 2. used sync_on_close instead of direct io > > 3. Don't use perf to avoid noise > > 4. echo 0 > /sys/block/zram0/use_aio to test synchronous IO for old behavior > > ok, will use it in the tests below. > > > 1. ZRAM_SIZE=3G ZRAM_COMP_ALG=lzo LOG_SUFFIX=async FIO_LOOPS=2 MAX_ITER=1 ./zram-fio-test.sh > > 2. modify script to disable aio via /sys/block/zram0/use_aio > > ZRAM_SIZE=3G ZRAM_COMP_ALG=lzo LOG_SUFFIX=sync FIO_LOOPS=2 MAX_ITER=1 ./zram-fio-test.sh > > > > seq-write 380930 474325 124.52% > > rand-write 286183 357469 124.91% > > seq-read 266813 265731 99.59% > > rand-read 211747 210670 99.49% > > mixed-seq(R) 145750 171232 117.48% > > mixed-seq(W) 145736 171215 117.48% > > mixed-rand(R) 115355 125239 108.57% > > mixed-rand(W) 115371 125256 108.57% > > no_aio use_aio > > WRITE: 1432.9MB/s 1511.5MB/s > WRITE: 1173.9MB/s 1186.9MB/s > READ: 912699KB/s 912170KB/s > WRITE: 912497KB/s 911968KB/s > READ: 725658KB/s 726747KB/s > READ: 579003KB/s 594543KB/s > READ: 373276KB/s 373719KB/s > WRITE: 373572KB/s 374016KB/s > > seconds elapsed 45.399702511 44.280199716 > > > LZO compression is fast and a CPU for queueing while 3 CPU for compressing > > it cannot saturate CPU full bandwidth. Nonetheless, it shows 24% enhancement. > > It could be more in slow CPU like embedded. > > > > I tested it with deflate. The result is 300% enhancement. > > > > seq-write 33598 109882 327.05% > > rand-write 32815 102293 311.73% > > seq-read 154323 153765 99.64% > > rand-read 129978 129241 99.43% > > mixed-seq(R) 15887 44995 283.22% > > mixed-seq(W) 15885 44990 283.22% > > mixed-rand(R) 25074 55491 221.31% > > mixed-rand(W) 25078 55499 221.31% > > > > So, curious with your test. > > Am my test sync with yours? If you cannot see enhancment in job1, could > > you test with deflate? It seems your CPU is really fast. > > interesting observation. > > no_aio use_aio > WRITE: 47882KB/s 158931KB/s > WRITE: 47714KB/s 156484KB/s > READ: 42914KB/s 137997KB/s > WRITE: 42904KB/s 137967KB/s > READ: 333764KB/s 332828KB/s > READ: 293883KB/s 294709KB/s > READ: 51243KB/s 129701KB/s > WRITE: 51284KB/s 129804KB/s > > seconds elapsed 480.869169882 181.678431855 > > yes, looks like with lzo CPU manages to process bdi writeback fast enough > to keep fio-template-static-buffer worker active. > > to prove this theory: direct=1 cures zram-deflate. > > no_aio use_aio > WRITE: 41873KB/s 34257KB/s > WRITE: 41455KB/s 34087KB/s > READ: 36705KB/s 28960KB/s > WRITE: 36697KB/s 28954KB/s > READ: 327902KB/s 327270KB/s > READ: 316217KB/s 316886KB/s > READ: 35980KB/s 28131KB/s > WRITE: 36008KB/s 28153KB/s > > seconds elapsed 515.575252170 629.114626795 > > > > as soon as wb flush kworker can't keep up anymore things are going off > the rails. most of the time, fio-template-static-buffer are in D state, > while the biggest bdi flush kworker is doing the job (a lot of job): > > PID USER PR NI VIRT RES %CPU %MEM TIME+ S COMMAND > 6274 root 20 0 0.0m 0.0m 100.0 0.0 1:15.60 R [kworker/u8:1] > 11169 root 20 0 718.1m 1.6m 16.6 0.0 0:01.88 D fio ././conf/fio-template-static-buffer > 11171 root 20 0 718.1m 1.6m 3.3 0.0 0:01.15 D fio ././conf/fio-template-static-buffer > 11170 root 20 0 718.1m 3.3m 2.6 0.1 0:00.98 D fio ././conf/fio-template-static-buffer > > > and still working... > > 6274 root 20 0 0.0m 0.0m 100.0 0.0 3:05.49 R [kworker/u8:1] > 12048 root 20 0 718.1m 1.6m 16.7 0.0 0:01.80 R fio ././conf/fio-template-static-buffer > 12047 root 20 0 718.1m 1.6m 3.3 0.0 0:01.12 D fio ././conf/fio-template-static-buffer > 12049 root 20 0 718.1m 1.6m 3.3 0.0 0:01.12 D fio ././conf/fio-template-static-buffer > 12050 root 20 0 718.1m 1.6m 2.0 0.0 0:00.98 D fio ././conf/fio-template-static-buffer > > and working... > > > [ 4159.338731] CPU: 0 PID: 105 Comm: kworker/u8:4 > [ 4159.338734] Workqueue: writeback wb_workfn (flush-254:0) > [ 4159.338746] [<ffffffffa01d8cff>] zram_make_request+0x4a3/0x67b [zram] > [ 4159.338748] [<ffffffff810543fe>] ? try_to_wake_up+0x201/0x213 > [ 4159.338750] [<ffffffff810ae9d3>] ? mempool_alloc+0x5e/0x124 > [ 4159.338752] [<ffffffff811a9922>] generic_make_request+0xb8/0x156 > [ 4159.338753] [<ffffffff811a9aaf>] submit_bio+0xef/0xf8 > [ 4159.338755] [<ffffffff81121a97>] submit_bh_wbc.isra.10+0x16b/0x178 > [ 4159.338757] [<ffffffff811223ec>] __block_write_full_page+0x1b2/0x2a6 > [ 4159.338758] [<ffffffff8112403e>] ? bh_submit_read+0x5a/0x5a > [ 4159.338760] [<ffffffff81120f9a>] ? end_buffer_write_sync+0x36/0x36 > [ 4159.338761] [<ffffffff8112403e>] ? bh_submit_read+0x5a/0x5a > [ 4159.338763] [<ffffffff811226d8>] block_write_full_page+0xf6/0xff > [ 4159.338765] [<ffffffff81124342>] blkdev_writepage+0x13/0x15 > [ 4159.338767] [<ffffffff810b498c>] __writepage+0xe/0x26 > [ 4159.338768] [<ffffffff810b65aa>] write_cache_pages+0x28c/0x376 > [ 4159.338770] [<ffffffff810b497e>] ? __wb_calc_thresh+0x83/0x83 > [ 4159.338772] [<ffffffff810b66dc>] generic_writepages+0x48/0x67 > [ 4159.338773] [<ffffffff81124318>] blkdev_writepages+0x9/0xb > [ 4159.338775] [<ffffffff81124318>] ? blkdev_writepages+0x9/0xb > [ 4159.338776] [<ffffffff810b6716>] do_writepages+0x1b/0x24 > [ 4159.338778] [<ffffffff8111b12c>] __writeback_single_inode+0x3d/0x155 > [ 4159.338779] [<ffffffff8111b407>] writeback_sb_inodes+0x1c3/0x32c > [ 4159.338781] [<ffffffff8111b5e1>] __writeback_inodes_wb+0x71/0xa9 > [ 4159.338783] [<ffffffff8111b7ce>] wb_writeback+0x10f/0x1a1 > [ 4159.338785] [<ffffffff8111be32>] wb_workfn+0x1c9/0x24c > [ 4159.338786] [<ffffffff8111be32>] ? wb_workfn+0x1c9/0x24c > [ 4159.338788] [<ffffffff8104a2e2>] process_one_work+0x1a4/0x2a7 > [ 4159.338790] [<ffffffff8104ae32>] worker_thread+0x23b/0x37c > [ 4159.338792] [<ffffffff8104abf7>] ? rescuer_thread+0x2eb/0x2eb > [ 4159.338793] [<ffffffff8104f285>] kthread+0xce/0xd6 > [ 4159.338794] [<ffffffff8104f1b7>] ? kthread_create_on_node+0x1ad/0x1ad > [ 4159.338796] [<ffffffff8145ad12>] ret_from_fork+0x22/0x30 > > > so the question is -- can we move this parallelization out of zram > and instead flush bdi in more than one kthread? how bad that would > be? can anyone else benefit from this? Isn't it blk-mq you mentioned? With blk-mq, I have some concerns. 1. read speed degradation 2. no work with rw_page 3. more memory footprint by bio/request queue allocation Having said, it's worth to look into it in detail more. I will have time to see that approach to know what I can do with that. Thanks! > > [1] https://lwn.net/Articles/353844/ > [2] https://lwn.net/Articles/354852/ > > -ss -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello Minchan, On (10/07/16 15:33), Minchan Kim wrote: [..] > > as soon as wb flush kworker can't keep up anymore things are going off > > the rails. most of the time, fio-template-static-buffer are in D state, > > while the biggest bdi flush kworker is doing the job (a lot of job): > > > > PID USER PR NI VIRT RES %CPU %MEM TIME+ S COMMAND > > 6274 root 20 0 0.0m 0.0m 100.0 0.0 1:15.60 R [kworker/u8:1] > > 11169 root 20 0 718.1m 1.6m 16.6 0.0 0:01.88 D fio ././conf/fio-template-static-buffer > > 11171 root 20 0 718.1m 1.6m 3.3 0.0 0:01.15 D fio ././conf/fio-template-static-buffer > > 11170 root 20 0 718.1m 3.3m 2.6 0.1 0:00.98 D fio ././conf/fio-template-static-buffer > > > > > > and still working... > > > > 6274 root 20 0 0.0m 0.0m 100.0 0.0 3:05.49 R [kworker/u8:1] > > 12048 root 20 0 718.1m 1.6m 16.7 0.0 0:01.80 R fio ././conf/fio-template-static-buffer > > 12047 root 20 0 718.1m 1.6m 3.3 0.0 0:01.12 D fio ././conf/fio-template-static-buffer > > 12049 root 20 0 718.1m 1.6m 3.3 0.0 0:01.12 D fio ././conf/fio-template-static-buffer > > 12050 root 20 0 718.1m 1.6m 2.0 0.0 0:00.98 D fio ././conf/fio-template-static-buffer > > > > and working... [..] > Isn't it blk-mq you mentioned? With blk-mq, I have some concerns. > > 1. read speed degradation > 2. no work with rw_page > 3. more memory footprint by bio/request queue allocation yes, I did. and I've seen your concerns in another email - I just don't have enough knowledge at the moment to say something not entirely stupid. gotta look more at the whole thing. > Having said, it's worth to look into it in detail more. > I will have time to see that approach to know what I can do > with that. thanks a lot! will keep looking as well. -ss -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sergey, On Fri, Oct 07, 2016 at 03:33:22PM +0900, Minchan Kim wrote: < snip > > > so the question is -- can we move this parallelization out of zram > > and instead flush bdi in more than one kthread? how bad that would > > be? can anyone else benefit from this? > > Isn't it blk-mq you mentioned? With blk-mq, I have some concerns. > > 1. read speed degradation > 2. no work with rw_page > 3. more memory footprint by bio/request queue allocation > > Having said, it's worth to look into it in detail more. > I will have time to see that approach to know what I can do > with that. queue_mode=2 bs=4096 nr_devices=1 submit_queues=4 hw_queue_depth=128 Last week, I played with null_blk and blk-mq.c to get an idea how blk-mq works and I realized it's not good for zram because it aims to solve 1) dispatch queue bottleneck 2) cache-friendly IO completion through IRQ so 3) avoids remote memory accesses. For zram which is used for embedded as primary purpose, ones listed abvoe are not a severe problem. Most imporant thing is there is no model to support that a process queueing IO request on *a* CPU while other CPUs issues the queued IO to driver. Anyway, Although blk-mrq can support that model, it is blk-layer thing. IOW, it's software stuff for fast IO delievry but what we need is device parallelism of zram itself. So, although we follow blk-mq, we still need multiple threads to compress in parallel which is most of code I wrote in this patchset. If I cannot get huge benefit(e.g., reduce a lot of zram-speicif code to support such model) with blk-mq, I don't feel to switch to request model at the cost of reasons I stated above. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello Minchan, On (10/17/16 14:04), Minchan Kim wrote: > Hi Sergey, > > On Fri, Oct 07, 2016 at 03:33:22PM +0900, Minchan Kim wrote: > > < snip > > > > > so the question is -- can we move this parallelization out of zram > > > and instead flush bdi in more than one kthread? how bad that would > > > be? can anyone else benefit from this? > > > > Isn't it blk-mq you mentioned? With blk-mq, I have some concerns. > > > > 1. read speed degradation > > 2. no work with rw_page > > 3. more memory footprint by bio/request queue allocation > > > > Having said, it's worth to look into it in detail more. > > I will have time to see that approach to know what I can do > > with that. > > queue_mode=2 bs=4096 nr_devices=1 submit_queues=4 hw_queue_depth=128 > > Last week, I played with null_blk and blk-mq.c to get an idea how > blk-mq works and I realized it's not good for zram because it aims > to solve 1) dispatch queue bottleneck 2) cache-friendly IO completion > through IRQ so 3) avoids remote memory accesses. > > For zram which is used for embedded as primary purpose, ones listed > abvoe are not a severe problem. Most imporant thing is there is no > model to support that a process queueing IO request on *a* CPU while > other CPUs issues the queued IO to driver. > > Anyway, Although blk-mrq can support that model, it is blk-layer thing. > IOW, it's software stuff for fast IO delievry but what we need is > device parallelism of zram itself. So, although we follow blk-mq, > we still need multiple threads to compress in parallel which is most of > code I wrote in this patchset. yes. but at least wb can be multi-threaded. well, sort of. seems like. sometimes. > If I cannot get huge benefit(e.g., reduce a lot of zram-speicif code > to support such model) with blk-mq, I don't feel to switch to request > model at the cost of reasons I stated above. thanks. I'm looking at your patches. -ss -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 21, 2016 at 03:08:09PM +0900, Sergey Senozhatsky wrote: > Hello Minchan, > > On (10/17/16 14:04), Minchan Kim wrote: > > Hi Sergey, > > > > On Fri, Oct 07, 2016 at 03:33:22PM +0900, Minchan Kim wrote: > > > > < snip > > > > > > > so the question is -- can we move this parallelization out of zram > > > > and instead flush bdi in more than one kthread? how bad that would > > > > be? can anyone else benefit from this? > > > > > > Isn't it blk-mq you mentioned? With blk-mq, I have some concerns. > > > > > > 1. read speed degradation > > > 2. no work with rw_page > > > 3. more memory footprint by bio/request queue allocation > > > > > > Having said, it's worth to look into it in detail more. > > > I will have time to see that approach to know what I can do > > > with that. > > > > queue_mode=2 bs=4096 nr_devices=1 submit_queues=4 hw_queue_depth=128 > > > > Last week, I played with null_blk and blk-mq.c to get an idea how > > blk-mq works and I realized it's not good for zram because it aims > > to solve 1) dispatch queue bottleneck 2) cache-friendly IO completion > > through IRQ so 3) avoids remote memory accesses. > > > > For zram which is used for embedded as primary purpose, ones listed > > abvoe are not a severe problem. Most imporant thing is there is no > > model to support that a process queueing IO request on *a* CPU while > > other CPUs issues the queued IO to driver. > > > > Anyway, Although blk-mrq can support that model, it is blk-layer thing. > > IOW, it's software stuff for fast IO delievry but what we need is > > device parallelism of zram itself. So, although we follow blk-mq, > > we still need multiple threads to compress in parallel which is most of > > code I wrote in this patchset. > > yes. but at least wb can be multi-threaded. well, sort of. seems like. > sometimes. Maybe, but it would be rather greedy approach for zram because zram will do real IO(esp, compression which consumed a lot of time) in that context although the context is sharable resource of all processes in the system. > > > If I cannot get huge benefit(e.g., reduce a lot of zram-speicif code > > to support such model) with blk-mq, I don't feel to switch to request > > model at the cost of reasons I stated above. > > thanks. > I'm looking at your patches. Currently, I found some subtle bug in my patchset so I will resend them after hunting that with fixing a bug you found. Thanks, Sergey! -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/conf/fio-template-static-buffer b/conf/fio-template-static-buffer index 1a9a473..22ddee8 100644 --- a/conf/fio-template-static-buffer +++ b/conf/fio-template-static-buffer @@ -1,7 +1,7 @@ [global] bs=${BLOCK_SIZE}k ioengine=sync -direct=1 +fsync_on_close=1 nrfiles=${NRFILES} size=${SIZE} numjobs=${NUMJOBS} @@ -14,18 +14,18 @@ new_group group_reporting threads=1 -[seq-read] -rw=read - -[rand-read] -rw=randread - [seq-write] rw=write [rand-write] rw=randwrite +[seq-read] +rw=read + +[rand-read] +rw=randread + [mixed-seq] rw=rw diff --git a/zram-fio-test.sh b/zram-fio-test.sh index 39c11b3..ca2d065 100755 --- a/zram-fio-test.sh +++ b/zram-fio-test.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/bash # Sergey Senozhatsky. sergey.senozhatsky@gmail.com @@ -37,6 +37,7 @@ function create_zram echo $ZRAM_COMP_ALG > /sys/block/zram0/comp_algorithm cat /sys/block/zram0/comp_algorithm + echo 0 > /sys/block/zram0/use_aio echo $ZRAM_SIZE > /sys/block/zram0/disksize if [ $? != 0 ]; then return -1 @@ -137,7 +138,7 @@ function main echo "#jobs$i fio" >> $LOG BLOCK_SIZE=4 SIZE=100% NUMJOBS=$i NRFILES=$i FIO_LOOPS=$FIO_LOOPS \ - $PERF stat -o $LOG-perf-stat $FIO ./$FIO_TEMPLATE >> $LOG + $FIO ./$FIO_TEMPLATE > $LOG echo -n "perfstat jobs$i" >> $LOG cat $LOG-perf-stat >> $LOG