Message ID | cover.1584728740.git.zhangweiping@didiglobal.com (mailing list archive) |
---|---|
Headers | show |
Series | blkcg: add blk-iotrack | expand |
On Sat, Mar 21, 2020 at 09:20:36AM +0800, Weiping Zhang wrote: > The user space tool, which called iotrack, used to collect these basic > io statistics and then generate more valuable metrics at cgroup level. > From iotrack, you can get a cgroup's percentile for io, bytes, > total_time and disk_time of the whole disk. It can easily to evaluate > the real weight of the weight based policy(bfq, blk-iocost). > There are lots of metrics for read and write generate by iotrack, > for more details, please visit: https://github.com/dublio/iotrack. > > Test result for two fio with randread 4K, > test1 cgroup bfq weight = 800 > test2 cgroup bfq weight = 100 > > Device io/s MB/s %io %MB %tm %dtm %d2c %hit0 %hit1 %hit2 %hit3 %hit4 %hit5 %hit6 %hit7 cgroup > nvme1n1 44588.00 174.17 100.00 100.00 100.00 100.00 38.46 0.25 45.27 95.90 98.33 99.47 99.85 99.92 99.95 / > nvme1n1 30206.00 117.99 67.74 67.74 29.44 67.29 87.90 0.35 47.82 99.22 99.98 99.99 99.99 100.00 100.00 /test1 > nvme1n1 14370.00 56.13 32.23 32.23 70.55 32.69 17.82 0.03 39.89 88.92 94.88 98.37 99.53 99.77 99.85 /test2 Maybe this'd be better done with bpf?
Tejun Heo <tj@kernel.org> 于2020年3月25日周三 上午2:27写道: > > On Sat, Mar 21, 2020 at 09:20:36AM +0800, Weiping Zhang wrote: > > The user space tool, which called iotrack, used to collect these basic > > io statistics and then generate more valuable metrics at cgroup level. > > From iotrack, you can get a cgroup's percentile for io, bytes, > > total_time and disk_time of the whole disk. It can easily to evaluate > > the real weight of the weight based policy(bfq, blk-iocost). > > There are lots of metrics for read and write generate by iotrack, > > for more details, please visit: https://github.com/dublio/iotrack. > > > > Test result for two fio with randread 4K, > > test1 cgroup bfq weight = 800 > > test2 cgroup bfq weight = 100 > > > > Device io/s MB/s %io %MB %tm %dtm %d2c %hit0 %hit1 %hit2 %hit3 %hit4 %hit5 %hit6 %hit7 cgroup > > nvme1n1 44588.00 174.17 100.00 100.00 100.00 100.00 38.46 0.25 45.27 95.90 98.33 99.47 99.85 99.92 99.95 / > > nvme1n1 30206.00 117.99 67.74 67.74 29.44 67.29 87.90 0.35 47.82 99.22 99.98 99.99 99.99 100.00 100.00 /test1 > > nvme1n1 14370.00 56.13 32.23 32.23 70.55 32.69 17.82 0.03 39.89 88.92 94.88 98.37 99.53 99.77 99.85 /test2 > > Maybe this'd be better done with bpf? > Hi Tejun, How about support both iotrack and bpf? If it's ok I want to add bpf support in another patchset, I saw that iocost_monitor.py was base on drgn, maybe write a new script "biotrack" base on drgn. For this patchset, iotrack can work well, I'm using it to monitor block cgroup for selecting a proper io isolation policy. Thanks a ton Weiping > -- > tejun
On Wed, Mar 25, 2020 at 08:49:24PM +0800, Weiping Zhang wrote: > For this patchset, iotrack can work well, I'm using it to monitor > block cgroup for > selecting a proper io isolation policy. Yeah, I get that but monitoring needs tend to diverge quite a bit depending on the use cases making detailed monitoring often need fair bit of flexibility, so I'm a bit skeptical about adding a fixed controller for that. I think a better approach may be implementing features which can make dynamic monitoring whether that's through bpf, drgn or plain tracepoints easier and more efficient. Thanks.
Tejun Heo <tj@kernel.org> 于2020年3月25日周三 下午10:12写道: > > On Wed, Mar 25, 2020 at 08:49:24PM +0800, Weiping Zhang wrote: > > For this patchset, iotrack can work well, I'm using it to monitor > > block cgroup for > > selecting a proper io isolation policy. > > Yeah, I get that but monitoring needs tend to diverge quite a bit > depending on the use cases making detailed monitoring often need fair > bit of flexibility, so I'm a bit skeptical about adding a fixed > controller for that. I think a better approach may be implementing > features which can make dynamic monitoring whether that's through bpf, > drgn or plain tracepoints easier and more efficient. > I agree with you, there are lots of io metrics needed in the real production system. The more flexible way is export all bio structure members of bio’s whole life to the userspace without re-compiling kernel, like what bpf did. Now the main block cgroup isolation policy: blk-iocost and bfq are weght based, blk-iolatency is latency based. The blk-iotrack can track the real percentage for IOs,kB,on disk time(d2c), and total time, it’s a good indicator to the real weight. For blk-iolatency, the blk-iotrack has 8 lantency thresholds to show latency distribution, so if we change these thresholds around to blk-iolateny.target.latency, we can tune the target latency to a more proper value. blk-iotrack extends the basic io.stat. It just export the important basic io statistics for cgroup,like what /prof/diskstats for block device. And it’s easy programming, iotrack just working like iostat, but focus on cgroup. blk-iotrack is friendly with these block cgroup isolation policies, a indicator for cgroup weight and lantency. Thanks
Weiping Zhang <zwp10758@gmail.com> 于2020年3月26日周四 上午12:45写道: > > Tejun Heo <tj@kernel.org> 于2020年3月25日周三 下午10:12写道: > > > > On Wed, Mar 25, 2020 at 08:49:24PM +0800, Weiping Zhang wrote: > > > For this patchset, iotrack can work well, I'm using it to monitor > > > block cgroup for > > > selecting a proper io isolation policy. > > > > Yeah, I get that but monitoring needs tend to diverge quite a bit > > depending on the use cases making detailed monitoring often need fair > > bit of flexibility, so I'm a bit skeptical about adding a fixed > > controller for that. I think a better approach may be implementing > > features which can make dynamic monitoring whether that's through bpf, > > drgn or plain tracepoints easier and more efficient. > > > I agree with you, there are lots of io metrics needed in the real > production system. > The more flexible way is export all bio structure members of bio’s whole life to > the userspace without re-compiling kernel, like what bpf did. > > Now the main block cgroup isolation policy: > blk-iocost and bfq are weght based, blk-iolatency is latency based. > The blk-iotrack can track the real percentage for IOs,kB,on disk time(d2c), > and total time, it’s a good indicator to the real weight. For blk-iolatency, the > blk-iotrack has 8 lantency thresholds to show latency distribution, so if we > change these thresholds around to blk-iolateny.target.latency, we can tune > the target latency to a more proper value. > > blk-iotrack extends the basic io.stat. It just export the important > basic io statistics > for cgroup,like what /prof/diskstats for block device. And it’s easy > programming, > iotrack just working like iostat, but focus on cgroup. > > blk-iotrack is friendly with these block cgroup isolation policies, a > indicator for > cgroup weight and lantency. > Hi Tejun, I do a testing at cgroup v2 and monitoring them by iotrack, then I compare the fio's output and iotrack's, they can match well. cgroup weight test: /sys/fs/cgroup/test1 /sys/fs/cgroup/test2 test1.weight : test2.weight = 8 : 1 Now I just run randread-4K fio test by these three policy: iocost, bfq, and nvme-wrr For blk-iocost I run "iocost_coef_gen.py" and set result to the "io.cost.model". 259:0 ctrl=user model=linear rbps=3286476297 rseqiops=547837 rrandiops=793881 wbps=2001272356 wseqiops=482243 wrandiops=483037 But iocost_test1 cannot get 8/(8+1) iops, and the total disk iops is 737559 < 79388, even I change rrandiops=637000. test case bw iops rd_avg_lat rd_p99_lat ========================================================== iocost_test1 1550478 387619 659.76 1662.00 iocost_test2 1399761 349940 730.83 1712.00 wrr_test1 2618185 654546 390.59 1187.00 wrr_test2 362613 90653 2822.62 4358.00 bfq_test1 714127 178531 1432.43 489.00 bfq_test2 178821 44705 5721.76 552.00 The detail test report can be found at: https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test > Thanks
On Thu, Mar 26, 2020 at 11:08:45PM +0800, Weiping Zhang wrote: > But iocost_test1 cannot get 8/(8+1) iops, and the total disk iops > is 737559 < 79388, even I change rrandiops=637000. iocost needs QoS targets set especially for deep queue devcies. W/o QoS targets, it only throttles when QD is saturated, which might not happen at all depending on fio job params. Can you try with sth like the following in io.cost.qos? 259:0 enable=1 ctrl=user rpct=95.00 rlat=5000 wpct=50.00 wlat=10000 In case you see significant bw loss, step up the r/wlat params. Thanks.
Tejun Heo <tj@kernel.org> 于2020年3月27日周五 上午12:15写道: > > On Thu, Mar 26, 2020 at 11:08:45PM +0800, Weiping Zhang wrote: > > But iocost_test1 cannot get 8/(8+1) iops, and the total disk iops > > is 737559 < 79388, even I change rrandiops=637000. > > iocost needs QoS targets set especially for deep queue devcies. W/o QoS targets, > it only throttles when QD is saturated, which might not happen at all depending > on fio job params. > > Can you try with sth like the following in io.cost.qos? > > 259:0 enable=1 ctrl=user rpct=95.00 rlat=5000 wpct=50.00 wlat=10000 > > In case you see significant bw loss, step up the r/wlat params. > OK, I'll try it. I really appreciate that if you help review blk-iotrack.c, or just drop io.iotrakc.stat and append these statistics to the io.stat? I think these metrics is usefull, and it just extend io.stat output. Thanks a ton
Hello, Weiping. On Fri, Mar 27, 2020 at 12:27:11AM +0800, Weiping Zhang wrote: > I really appreciate that if you help review blk-iotrack.c, or just > drop io.iotrakc.stat > and append these statistics to the io.stat? I think these metrics is usefull, So, the problem is that you can get the same exact and easily more information using bpf. There definitely are benefits to baking in some stastics in terms of overhead and accessbility but I'm not sure what's being proposed is generic and/or flexible enough to bake into the interface at this point. Something which can be immediately useful would be cgroup-aware bpf progs which expose these statistics. Can you please take a look at the followings? https://github.com/iovisor/bcc/blob/master/tools/biolatency.py https://github.com/iovisor/bcc/blob/master/tools/biolatency_example.txt They aren't cgroup aware but can be made so and can provide a lot more detailed statistics than something we can hardcode into the kernel. Thanks.