Message ID | 20240118073032.15015-1-xiaobing.li@samsung.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v7] io_uring: Statistics of the true utilization of sq threads. | expand |
On 1/18/24 12:30 AM, Xiaobing Li wrote: > diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c > index 976e9500f651..24a7452ed98e 100644 > --- a/io_uring/fdinfo.c > +++ b/io_uring/fdinfo.c > @@ -64,6 +64,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) > unsigned int sq_shift = 0; > unsigned int sq_entries, cq_entries; > int sq_pid = -1, sq_cpu = -1; > + long long sq_total_time = 0, sq_work_time = 0; > bool has_lock; > unsigned int i; > > @@ -147,10 +148,17 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) > > sq_pid = sq->task_pid; > sq_cpu = sq->sq_cpu; > + struct rusage r; > + > + getrusage(sq->thread, RUSAGE_SELF, &r); > + sq_total_time = r.ru_stime.tv_sec * 1000000 + r.ru_stime.tv_usec; > + sq_work_time = sq->work_time; > } I guess getrusage() is fine here, though I would probably just grab it directly. > diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c > index 65b5dbe3c850..f3e9fda72400 100644 > --- a/io_uring/sqpoll.c > +++ b/io_uring/sqpoll.c > @@ -251,6 +251,9 @@ static int io_sq_thread(void *data) > } > > cap_entries = !list_is_singular(&sqd->ctx_list); > + ktime_t start, diff; > + > + start = ktime_get(); > list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { > int ret = __io_sq_thread(ctx, cap_entries); But why on earth is this part then not doing getrusage() as well? > diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h > index 8df37e8c9149..c14c00240443 100644 > --- a/io_uring/sqpoll.h > +++ b/io_uring/sqpoll.h > @@ -16,6 +16,7 @@ struct io_sq_data { > pid_t task_pid; > pid_t task_tgid; > > + long long work_time; > unsigned long state; > struct completion exited; > }; Probably just make that an u64. As Pavel mentioned, I think we really need to consider if fdinfo is the appropriate API for this. It's fine if you're running stuff directly and you're just curious, but it's a very cumbersome API in general as you need to know the pid of the task holding the ring, the fd of the ring, and then you can get it as a textual description. If this is something that is deemed useful, would it not make more sense to make it programatically available in addition, or even exclusively?
On 1/18/24 19:34, Jens Axboe wrote: >> diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h >> index 8df37e8c9149..c14c00240443 100644 >> --- a/io_uring/sqpoll.h >> +++ b/io_uring/sqpoll.h >> @@ -16,6 +16,7 @@ struct io_sq_data { >> pid_t task_pid; >> pid_t task_tgid; >> >> + long long work_time; >> unsigned long state; >> struct completion exited; >> }; > >Probably just make that an u64. > >As Pavel mentioned, I think we really need to consider if fdinfo is the >appropriate API for this. It's fine if you're running stuff directly and >you're just curious, but it's a very cumbersome API in general as you >need to know the pid of the task holding the ring, the fd of the ring, >and then you can get it as a textual description. If this is something >that is deemed useful, would it not make more sense to make it >programatically available in addition, or even exclusively? Hi, Jens and Pavel sorry for the late reply. I've tried some other methods, but overall, I haven't found a more suitable method than fdinfo. If you think it is troublesome to obtain the PID, then I can provide a shell script to output the total_time and work_time of all sqpoll threads to the terminal, so that we do not have to manually obtain the PID of each thread (the script can be placed in tools/ include/io_uring). eg: PID WorkTime(us) TotalTime(us) COMMAND 9330 1106578 2215321 iou-sqp-9329 9454 1510658 1715321 iou-sqp-9453 9478 165785 223219 iou-sqp-9477 9587 106578 153217 iou-sqp-9586 What do you think of this solution?
On 1/29/24 12:18 AM, Xiaobing Li wrote: > On 1/18/24 19:34, Jens Axboe wrote: >>> diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h >>> index 8df37e8c9149..c14c00240443 100644 >>> --- a/io_uring/sqpoll.h >>> +++ b/io_uring/sqpoll.h >>> @@ -16,6 +16,7 @@ struct io_sq_data { >>> pid_t task_pid; >>> pid_t task_tgid; >>> >>> + long long work_time; >>> unsigned long state; >>> struct completion exited; >>> }; >> >> Probably just make that an u64. >> >> As Pavel mentioned, I think we really need to consider if fdinfo is the >> appropriate API for this. It's fine if you're running stuff directly and >> you're just curious, but it's a very cumbersome API in general as you >> need to know the pid of the task holding the ring, the fd of the ring, >> and then you can get it as a textual description. If this is something >> that is deemed useful, would it not make more sense to make it >> programatically available in addition, or even exclusively? > > Hi, Jens and Pavel > sorry for the late reply. > > I've tried some other methods, but overall, I haven't found a more suitable > method than fdinfo. > If you think it is troublesome to obtain the PID, then I can provide > a shell script to output the total_time and work_time of all sqpoll threads > to the terminal, so that we do not have to manually obtain the PID of each > thread (the script can be placed in tools/ include/io_uring). > > eg: > > PID WorkTime(us) TotalTime(us) COMMAND > 9330 1106578 2215321 iou-sqp-9329 > 9454 1510658 1715321 iou-sqp-9453 > 9478 165785 223219 iou-sqp-9477 > 9587 106578 153217 iou-sqp-9586 > > What do you think of this solution? I don't think it's a great interface, but at the same time, I don't feel that strongly about it and perhaps bundling a script that outputs the above in liburing would be Good Enough. I'm a bit reluctant to add a stats API via io_uring_register() just for this. So maybe spin a v8 with the s/long long/u64 change and include your script as well?
On 1/29/24 07:18, Xiaobing Li wrote: > On 1/18/24 19:34, Jens Axboe wrote: >>> diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h >>> index 8df37e8c9149..c14c00240443 100644 >>> --- a/io_uring/sqpoll.h >>> +++ b/io_uring/sqpoll.h >>> @@ -16,6 +16,7 @@ struct io_sq_data { >>> pid_t task_pid; >>> pid_t task_tgid; >>> >>> + long long work_time; >>> unsigned long state; >>> struct completion exited; >>> }; >> >> Probably just make that an u64. >> >> As Pavel mentioned, I think we really need to consider if fdinfo is the >> appropriate API for this. It's fine if you're running stuff directly and >> you're just curious, but it's a very cumbersome API in general as you >> need to know the pid of the task holding the ring, the fd of the ring, >> and then you can get it as a textual description. If this is something >> that is deemed useful, would it not make more sense to make it >> programatically available in addition, or even exclusively? > > Hi, Jens and Pavel > sorry for the late reply. > > I've tried some other methods, but overall, I haven't found a more suitable > method than fdinfo. I wouldn't mind if it's fdinfo only for now, that can be changed later if needed. I'm more concerned that reading fdinfo and then parsing it is incompatible with the word performance, which you mentioned in the context of using 1 vs 2 syscalls to get the stats. That can be left to be resolved later, however. Let's just be clear in docs that stats could be 0, which means the feature is not working/disabled. Another question I raised in my reply (v6 thread), why it's using ktime_get(), which same as jiffies but more precise, instead of a task time? > If you think it is troublesome to obtain the PID, then I can provide I missed the context, where do we need to know PIDs? > a shell script to output the total_time and work_time of all sqpoll threads > to the terminal, so that we do not have to manually obtain the PID of each > thread (the script can be placed in tools/ include/io_uring). > > eg: > > PID WorkTime(us) TotalTime(us) COMMAND > 9330 1106578 2215321 iou-sqp-9329 > 9454 1510658 1715321 iou-sqp-9453 > 9478 165785 223219 iou-sqp-9477 > 9587 106578 153217 iou-sqp-9586 > > What do you think of this solution?
On 1/29/24 15:01, Pavel Begunkov wrote: >On 1/29/24 07:18, Xiaobing Li wrote: >> On 1/18/24 19:34, Jens Axboe wrote: >>>> diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h >>>> index 8df37e8c9149..c14c00240443 100644 >>>> --- a/io_uring/sqpoll.h >>>> +++ b/io_uring/sqpoll.h >>>> @@ -16,6 +16,7 @@ struct io_sq_data { >>>> pid_t task_pid; >>>> pid_t task_tgid; >>>> >>>> + long long work_time; >>>> unsigned long state; >>>> struct completion exited; >>>> }; >>> >>> Probably just make that an u64. >>> >>> As Pavel mentioned, I think we really need to consider if fdinfo is the >>> appropriate API for this. It's fine if you're running stuff directly and >>> you're just curious, but it's a very cumbersome API in general as you >>> need to know the pid of the task holding the ring, the fd of the ring, >>> and then you can get it as a textual description. If this is something >>> that is deemed useful, would it not make more sense to make it >>> programatically available in addition, or even exclusively? >> >> Hi, Jens and Pavel >> sorry for the late reply. >> >> I've tried some other methods, but overall, I haven't found a more suitable >> method than fdinfo. > >I wouldn't mind if it's fdinfo only for now, that can be changed later >if needed. I'm more concerned that reading fdinfo and then parsing it >is incompatible with the word performance, which you mentioned in the >context of using 1 vs 2 syscalls to get the stats. > >That can be left to be resolved later, however. Let's just be clear >in docs that stats could be 0, which means the feature is not >working/disabled. > >Another question I raised in my reply (v6 thread), why it's using >ktime_get(), which same as jiffies but more precise, instead of a >task time? Sorry, I forgot to reply to you. I was thinking wrong. you are right, we can use "getrusage" to statistics the work_time of sqpoll in the ring. > > >> If you think it is troublesome to obtain the PID, then I can provide > >I missed the context, where do we need to know PIDs? Since obtaining the fdinfo content of sqpoll requires finding the corresponding PID first, I guess Jens thinks it is troublesome to manually obtain the PID of each sqpoll thread when there are many sqpoll threads. Therefore, I want to write a script that can automatically output all sqpoll statistics. -- Xiaobing Li
On 1/29/24 14:00, Jens Axboe wrote: >On 1/29/24 12:18 AM, Xiaobing Li wrote: >> On 1/18/24 19:34, Jens Axboe wrote: >>>> diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h >>>> index 8df37e8c9149..c14c00240443 100644 >>>> --- a/io_uring/sqpoll.h >>>> +++ b/io_uring/sqpoll.h >>>> @@ -16,6 +16,7 @@ struct io_sq_data { >>>> pid_t task_pid; >>>> pid_t task_tgid; >>>> >>>> + long long work_time; >>>> unsigned long state; >>>> struct completion exited; >>>> }; >>> >>> Probably just make that an u64. >>> >>> As Pavel mentioned, I think we really need to consider if fdinfo is the >>> appropriate API for this. It's fine if you're running stuff directly and >>> you're just curious, but it's a very cumbersome API in general as you >>> need to know the pid of the task holding the ring, the fd of the ring, >>> and then you can get it as a textual description. If this is something >>> that is deemed useful, would it not make more sense to make it >>> programatically available in addition, or even exclusively? >> >> Hi, Jens and Pavel >> sorry for the late reply. >> >> I've tried some other methods, but overall, I haven't found a more suitable >> method than fdinfo. >> If you think it is troublesome to obtain the PID, then I can provide >> a shell script to output the total_time and work_time of all sqpoll threads >> to the terminal, so that we do not have to manually obtain the PID of each >> thread (the script can be placed in tools/ include/io_uring). >> >> eg: >> >> PID WorkTime(us) TotalTime(us) COMMAND >> 9330 1106578 2215321 iou-sqp-9329 >> 9454 1510658 1715321 iou-sqp-9453 >> 9478 165785 223219 iou-sqp-9477 >> 9587 106578 153217 iou-sqp-9586 >> >> What do you think of this solution? > >I don't think it's a great interface, but at the same time, I don't feel >that strongly about it and perhaps bundling a script that outputs the >above in liburing would be Good Enough. I'm a bit reluctant to add a >stats API via io_uring_register() just for this. > >So maybe spin a v8 with the s/long long/u64 change and include your >script as well? ok, I'll send out a v8. -- Xiaobing Li
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index 976e9500f651..24a7452ed98e 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -64,6 +64,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) unsigned int sq_shift = 0; unsigned int sq_entries, cq_entries; int sq_pid = -1, sq_cpu = -1; + long long sq_total_time = 0, sq_work_time = 0; bool has_lock; unsigned int i; @@ -147,10 +148,17 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) sq_pid = sq->task_pid; sq_cpu = sq->sq_cpu; + struct rusage r; + + getrusage(sq->thread, RUSAGE_SELF, &r); + sq_total_time = r.ru_stime.tv_sec * 1000000 + r.ru_stime.tv_usec; + sq_work_time = sq->work_time; } seq_printf(m, "SqThread:\t%d\n", sq_pid); seq_printf(m, "SqThreadCpu:\t%d\n", sq_cpu); + seq_printf(m, "SqTotalTime:\t%lldus\n", sq_total_time); + seq_printf(m, "SqWorkTime:\t%lldus\n", sq_work_time); seq_printf(m, "UserFiles:\t%u\n", ctx->nr_user_files); for (i = 0; has_lock && i < ctx->nr_user_files; i++) { struct file *f = io_file_from_index(&ctx->file_table, i); diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index 65b5dbe3c850..f3e9fda72400 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -251,6 +251,9 @@ static int io_sq_thread(void *data) } cap_entries = !list_is_singular(&sqd->ctx_list); + ktime_t start, diff; + + start = ktime_get(); list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { int ret = __io_sq_thread(ctx, cap_entries); @@ -260,6 +263,10 @@ static int io_sq_thread(void *data) if (io_run_task_work()) sqt_spin = true; + diff = ktime_sub(ktime_get(), start); + if (sqt_spin == true) + sqd->work_time += ktime_to_us(diff); + if (sqt_spin || !time_after(jiffies, timeout)) { if (sqt_spin) timeout = jiffies + sqd->sq_thread_idle; diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h index 8df37e8c9149..c14c00240443 100644 --- a/io_uring/sqpoll.h +++ b/io_uring/sqpoll.h @@ -16,6 +16,7 @@ struct io_sq_data { pid_t task_pid; pid_t task_tgid; + long long work_time; unsigned long state; struct completion exited; };
Count the running time and actual IO processing time of the sqpoll thread, and output the statistical data to fdinfo. Variable description: "work_time" in the code represents the sum of the jiffies of the sq thread actually processing IO, that is, how many milliseconds it actually takes to process IO. "total_time" represents the total time that the sq thread has elapsed from the beginning of the loop to the current time point, that is, how many milliseconds it has spent in total. The test tool is fio, and its parameters are as follows: [global] ioengine=io_uring direct=1 group_reporting bs=128k norandommap=1 randrepeat=0 refill_buffers ramp_time=30s time_based runtime=1m clocksource=clock_gettime overwrite=1 log_avg_msec=1000 numjobs=1 [disk0] filename=/dev/nvme0n1 rw=read iodepth=16 hipri sqthread_poll=1 --- The test results are as follows: Every 2.0s: cat /proc/9230/fdinfo/6 | grep -E Sq SqMask: 0x3 SqHead: 3197153 SqTail: 3197153 CachedSqHead: 3197153 SqThread: 9231 SqThreadCpu: 11 SqTotalTime: 18099614us SqWorkTime: 16748316us --- The test results corresponding to different iodepths are as follows: |-----------|-------|-------|-------|------|-------| | iodepth | 1 | 4 | 8 | 16 | 64 | |-----------|-------|-------|-------|------|-------| |utilization| 2.9% | 8.8% | 10.9% | 92.9%| 84.4% | |-----------|-------|-------|-------|------|-------| | idle | 97.1% | 91.2% | 89.1% | 7.1% | 15.6% | |-----------|-------|-------|-------|------|-------| changes: v7: - Get the total time of the sqpoll thread through getrusage - The working time of the sqpoll thread is obtained through ktime_get() v6: - Replace the percentages in the fdinfo output with the actual running time and the time actually processing IO v5: - list the changes in each iteration. v4: - Resubmit the patch based on removing sq->lock v3: - output actual working time as a percentage of total time - detailed description of the meaning of each variable - added test results v2: - output the total statistical time and work time to fdinfo v1: - initial version - Statistics of total time and work time Signed-off-by: Xiaobing Li <xiaobing.li@samsung.com> --- io_uring/fdinfo.c | 8 ++++++++ io_uring/sqpoll.c | 7 +++++++ io_uring/sqpoll.h | 1 + 3 files changed, 16 insertions(+)