diff mbox series

[v3] io_uring: Statistics of the true utilization of sq threads.

Message ID 20231115121839.12556-1-xiaobing.li@samsung.com (mailing list archive)
State New
Headers show
Series [v3] io_uring: Statistics of the true utilization of sq threads. | expand

Commit Message

Xiaobing Li Nov. 15, 2023, 12:18 p.m. UTC
v3:
1.Since the sq thread has a while(1) structure, during this process,
  there may be a lot of time that is not processing IO but does not
exceed the timeout period, therefore, the sqpoll thread will keep
running and will keep occupying the CPU. Obviously, the CPU is wasted at
this time;Our goal is to count the part of the time that the sqpoll
thread actually processes IO, so as to reflect the part of the CPU it
uses to process IO, which can be used to help improve the actual
utilization of the CPU in the future.

2."work_time" in the code represents the sum of the jiffies count of the
  sq thread actually processing IO, that is, how many milliseconds it
actually takes to process IO. "total_time" represents the total time
that the sq thread has elapsed from the beginning of the loop to the
current time point, that is, how many milliseconds it has spent in
total.
The output "SqBusy" represents the percentage of time utilization that
the sq thread actually uses to process IO.

3.The task_pid value in the io_sq_data structure should be assigned
  after the sq thread is created, otherwise the pid of its parent
process will be recorded.

4.After many tests, we do not need to obtain ctx->uring_lock in advance
  when obtaining ctx->sq_data. We can avoid null pointer references by
judging that ctx is not null.

Signed-off-by: Xiaobing Li <xiaobing.li@samsung.com>

The test results are as follows:
Every 0.5s: cat /proc/281126/fdinfo/6 | grep Sq
SqMask: 0x3
SqHead: 1168417
SqTail: 1168418
CachedSqHead:   1168418
SqThread:       281126
SqThreadCpu:    55
SqBusy: 96%
---
 io_uring/fdinfo.c | 31 ++++++++++++++++---------------
 io_uring/sqpoll.c | 20 ++++++++++++++++----
 io_uring/sqpoll.h |  2 ++
 3 files changed, 34 insertions(+), 19 deletions(-)

Comments

Jens Axboe Nov. 15, 2023, 1:51 p.m. UTC | #1
On 11/15/23 5:18 AM, Xiaobing Li wrote:
> v3:
> 1.Since the sq thread has a while(1) structure, during this process,
>   there may be a lot of time that is not processing IO but does not
> exceed the timeout period, therefore, the sqpoll thread will keep
> running and will keep occupying the CPU. Obviously, the CPU is wasted at
> this time;Our goal is to count the part of the time that the sqpoll
> thread actually processes IO, so as to reflect the part of the CPU it
> uses to process IO, which can be used to help improve the actual
> utilization of the CPU in the future.
> 
> 2."work_time" in the code represents the sum of the jiffies count of the
>   sq thread actually processing IO, that is, how many milliseconds it
> actually takes to process IO. "total_time" represents the total time
> that the sq thread has elapsed from the beginning of the loop to the
> current time point, that is, how many milliseconds it has spent in
> total.
> The output "SqBusy" represents the percentage of time utilization that
> the sq thread actually uses to process IO.
> 
> 3.The task_pid value in the io_sq_data structure should be assigned
>   after the sq thread is created, otherwise the pid of its parent
> process will be recorded.
> 
> 4.After many tests, we do not need to obtain ctx->uring_lock in advance
>   when obtaining ctx->sq_data. We can avoid null pointer references by
> judging that ctx is not null.

This is mixed in with my patch, please just base it on top of the v4
I'll send out now rather than integrate it. And then you'd need that rcu
free patch as well on top of that, then your changes as a separate
patch.
Dan Carpenter Nov. 20, 2023, 2:26 p.m. UTC | #2
Hi Xiaobing,

kernel test robot noticed the following build warnings:

https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xiaobing-Li/io_uring-Statistics-of-the-true-utilization-of-sq-threads/20231115-211954
base:   linus/master
patch link:    https://lore.kernel.org/r/20231115121839.12556-1-xiaobing.li%40samsung.com
patch subject: [PATCH v3] io_uring: Statistics of the true utilization of sq threads.
config: x86_64-randconfig-161-20231115 (https://download.01.org/0day-ci/archive/20231116/202311160629.h4GrebJh-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce: (https://download.01.org/0day-ci/archive/20231116/202311160629.h4GrebJh-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>
| Closes: https://lore.kernel.org/r/202311160629.h4GrebJh-lkp@intel.com/

smatch warnings:
io_uring/fdinfo.c:138 io_uring_show_fdinfo() warn: variable dereferenced before check 'ctx' (see line 57)
io_uring/fdinfo.c:144 io_uring_show_fdinfo() error: we previously assumed 'sq' could be null (see line 141)

vim +/ctx +138 io_uring/fdinfo.c

3aaf22b62a9270 Jens Axboe     2023-07-10   53  __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
a4ad4f748ea962 Jens Axboe     2022-05-25   54  {
3aaf22b62a9270 Jens Axboe     2023-07-10   55  	struct io_ring_ctx *ctx = f->private_data;
a4ad4f748ea962 Jens Axboe     2022-05-25   56  	struct io_overflow_cqe *ocqe;
a4ad4f748ea962 Jens Axboe     2022-05-25  @57  	struct io_rings *r = ctx->rings;
                                                                     ^^^^^^^^^^
Tons of unchecked dereferences so ctx can't be NULL.

a4ad4f748ea962 Jens Axboe     2022-05-25   58  	unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1;
a4ad4f748ea962 Jens Axboe     2022-05-25   59  	unsigned int sq_head = READ_ONCE(r->sq.head);
a4ad4f748ea962 Jens Axboe     2022-05-25   60  	unsigned int sq_tail = READ_ONCE(r->sq.tail);
a4ad4f748ea962 Jens Axboe     2022-05-25   61  	unsigned int cq_head = READ_ONCE(r->cq.head);
a4ad4f748ea962 Jens Axboe     2022-05-25   62  	unsigned int cq_tail = READ_ONCE(r->cq.tail);
a4ad4f748ea962 Jens Axboe     2022-05-25   63  	unsigned int cq_shift = 0;
3b8fdd1dc35e39 Jens Axboe     2022-09-11   64  	unsigned int sq_shift = 0;
a4ad4f748ea962 Jens Axboe     2022-05-25   65  	unsigned int sq_entries, cq_entries;
7644b1a1c9a7ae Jens Axboe     2023-10-21   66  	int sq_pid = -1, sq_cpu = -1;
5b1b61674371b7 Xiaobing Li    2023-11-15   67  	int sq_busy = 0;
a4ad4f748ea962 Jens Axboe     2022-05-25   68  	bool has_lock;
a4ad4f748ea962 Jens Axboe     2022-05-25   69  	unsigned int i;
a4ad4f748ea962 Jens Axboe     2022-05-25   70  
4f731705cc1f15 Jens Axboe     2022-09-11   71  	if (ctx->flags & IORING_SETUP_CQE32)
a4ad4f748ea962 Jens Axboe     2022-05-25   72  		cq_shift = 1;
3b8fdd1dc35e39 Jens Axboe     2022-09-11   73  	if (ctx->flags & IORING_SETUP_SQE128)
3b8fdd1dc35e39 Jens Axboe     2022-09-11   74  		sq_shift = 1;
a4ad4f748ea962 Jens Axboe     2022-05-25   75  
a4ad4f748ea962 Jens Axboe     2022-05-25   76  	/*
a4ad4f748ea962 Jens Axboe     2022-05-25   77  	 * we may get imprecise sqe and cqe info if uring is actively running
a4ad4f748ea962 Jens Axboe     2022-05-25   78  	 * since we get cached_sq_head and cached_cq_tail without uring_lock
a4ad4f748ea962 Jens Axboe     2022-05-25   79  	 * and sq_tail and cq_head are changed by userspace. But it's ok since
a4ad4f748ea962 Jens Axboe     2022-05-25   80  	 * we usually use these info when it is stuck.
a4ad4f748ea962 Jens Axboe     2022-05-25   81  	 */
a4ad4f748ea962 Jens Axboe     2022-05-25   82  	seq_printf(m, "SqMask:\t0x%x\n", sq_mask);
a4ad4f748ea962 Jens Axboe     2022-05-25   83  	seq_printf(m, "SqHead:\t%u\n", sq_head);
a4ad4f748ea962 Jens Axboe     2022-05-25   84  	seq_printf(m, "SqTail:\t%u\n", sq_tail);
a4ad4f748ea962 Jens Axboe     2022-05-25   85  	seq_printf(m, "CachedSqHead:\t%u\n", ctx->cached_sq_head);
a4ad4f748ea962 Jens Axboe     2022-05-25   86  	seq_printf(m, "CqMask:\t0x%x\n", cq_mask);
a4ad4f748ea962 Jens Axboe     2022-05-25   87  	seq_printf(m, "CqHead:\t%u\n", cq_head);
a4ad4f748ea962 Jens Axboe     2022-05-25   88  	seq_printf(m, "CqTail:\t%u\n", cq_tail);
a4ad4f748ea962 Jens Axboe     2022-05-25   89  	seq_printf(m, "CachedCqTail:\t%u\n", ctx->cached_cq_tail);
3b8fdd1dc35e39 Jens Axboe     2022-09-11   90  	seq_printf(m, "SQEs:\t%u\n", sq_tail - sq_head);
a4ad4f748ea962 Jens Axboe     2022-05-25   91  	sq_entries = min(sq_tail - sq_head, ctx->sq_entries);
a4ad4f748ea962 Jens Axboe     2022-05-25   92  	for (i = 0; i < sq_entries; i++) {
a4ad4f748ea962 Jens Axboe     2022-05-25   93  		unsigned int entry = i + sq_head;
a4ad4f748ea962 Jens Axboe     2022-05-25   94  		struct io_uring_sqe *sqe;
3b8fdd1dc35e39 Jens Axboe     2022-09-11   95  		unsigned int sq_idx;
a4ad4f748ea962 Jens Axboe     2022-05-25   96  
32f5dea040ee6e Jens Axboe     2023-09-01   97  		if (ctx->flags & IORING_SETUP_NO_SQARRAY)
32f5dea040ee6e Jens Axboe     2023-09-01   98  			break;
3b8fdd1dc35e39 Jens Axboe     2022-09-11   99  		sq_idx = READ_ONCE(ctx->sq_array[entry & sq_mask]);
a4ad4f748ea962 Jens Axboe     2022-05-25  100  		if (sq_idx > sq_mask)
a4ad4f748ea962 Jens Axboe     2022-05-25  101  			continue;
00927931cb630b Pavel Begunkov 2022-10-11  102  		sqe = &ctx->sq_sqes[sq_idx << sq_shift];
3b8fdd1dc35e39 Jens Axboe     2022-09-11  103  		seq_printf(m, "%5u: opcode:%s, fd:%d, flags:%x, off:%llu, "
3b8fdd1dc35e39 Jens Axboe     2022-09-11  104  			      "addr:0x%llx, rw_flags:0x%x, buf_index:%d "
3b8fdd1dc35e39 Jens Axboe     2022-09-11  105  			      "user_data:%llu",
3b8fdd1dc35e39 Jens Axboe     2022-09-11  106  			   sq_idx, io_uring_get_opcode(sqe->opcode), sqe->fd,
3b8fdd1dc35e39 Jens Axboe     2022-09-11  107  			   sqe->flags, (unsigned long long) sqe->off,
3b8fdd1dc35e39 Jens Axboe     2022-09-11  108  			   (unsigned long long) sqe->addr, sqe->rw_flags,
3b8fdd1dc35e39 Jens Axboe     2022-09-11  109  			   sqe->buf_index, sqe->user_data);
3b8fdd1dc35e39 Jens Axboe     2022-09-11  110  		if (sq_shift) {
3b8fdd1dc35e39 Jens Axboe     2022-09-11  111  			u64 *sqeb = (void *) (sqe + 1);
3b8fdd1dc35e39 Jens Axboe     2022-09-11  112  			int size = sizeof(struct io_uring_sqe) / sizeof(u64);
3b8fdd1dc35e39 Jens Axboe     2022-09-11  113  			int j;
3b8fdd1dc35e39 Jens Axboe     2022-09-11  114  
3b8fdd1dc35e39 Jens Axboe     2022-09-11  115  			for (j = 0; j < size; j++) {
3b8fdd1dc35e39 Jens Axboe     2022-09-11  116  				seq_printf(m, ", e%d:0x%llx", j,
3b8fdd1dc35e39 Jens Axboe     2022-09-11  117  						(unsigned long long) *sqeb);
3b8fdd1dc35e39 Jens Axboe     2022-09-11  118  				sqeb++;
3b8fdd1dc35e39 Jens Axboe     2022-09-11  119  			}
3b8fdd1dc35e39 Jens Axboe     2022-09-11  120  		}
3b8fdd1dc35e39 Jens Axboe     2022-09-11  121  		seq_printf(m, "\n");
a4ad4f748ea962 Jens Axboe     2022-05-25  122  	}
a4ad4f748ea962 Jens Axboe     2022-05-25  123  	seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head);
a4ad4f748ea962 Jens Axboe     2022-05-25  124  	cq_entries = min(cq_tail - cq_head, ctx->cq_entries);
a4ad4f748ea962 Jens Axboe     2022-05-25  125  	for (i = 0; i < cq_entries; i++) {
a4ad4f748ea962 Jens Axboe     2022-05-25  126  		unsigned int entry = i + cq_head;
a4ad4f748ea962 Jens Axboe     2022-05-25  127  		struct io_uring_cqe *cqe = &r->cqes[(entry & cq_mask) << cq_shift];
a4ad4f748ea962 Jens Axboe     2022-05-25  128  
4f731705cc1f15 Jens Axboe     2022-09-11  129  		seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x",
a4ad4f748ea962 Jens Axboe     2022-05-25  130  			   entry & cq_mask, cqe->user_data, cqe->res,
a4ad4f748ea962 Jens Axboe     2022-05-25  131  			   cqe->flags);
4f731705cc1f15 Jens Axboe     2022-09-11  132  		if (cq_shift)
4f731705cc1f15 Jens Axboe     2022-09-11  133  			seq_printf(m, ", extra1:%llu, extra2:%llu\n",
4f731705cc1f15 Jens Axboe     2022-09-11  134  					cqe->big_cqe[0], cqe->big_cqe[1]);
4f731705cc1f15 Jens Axboe     2022-09-11  135  		seq_printf(m, "\n");
a4ad4f748ea962 Jens Axboe     2022-05-25  136  	}
a4ad4f748ea962 Jens Axboe     2022-05-25  137  
5b1b61674371b7 Xiaobing Li    2023-11-15 @138  	if (ctx && (ctx->flags & IORING_SETUP_SQPOLL)) {
                                                    ^^^
Delete this check.

5b1b61674371b7 Xiaobing Li    2023-11-15  139  		struct io_sq_data *sq = ctx->sq_data;
5b1b61674371b7 Xiaobing Li    2023-11-15  140  
5b1b61674371b7 Xiaobing Li    2023-11-15 @141  		if (sq && sq->total_time != 0)
                                                            ^^
sq can be NULL?

5b1b61674371b7 Xiaobing Li    2023-11-15  142  			sq_busy = (int)(sq->work_time * 100 / sq->total_time);
5b1b61674371b7 Xiaobing Li    2023-11-15  143  
5b1b61674371b7 Xiaobing Li    2023-11-15 @144  		sq_pid = sq->task_pid;
                                                                 ^^^^
Unchecked dereference.
diff mbox series

Patch

diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index f04a43044d91..b9e2e339140d 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -64,6 +64,7 @@  __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
 	unsigned int sq_shift = 0;
 	unsigned int sq_entries, cq_entries;
 	int sq_pid = -1, sq_cpu = -1;
+	int sq_busy = 0;
 	bool has_lock;
 	unsigned int i;
 
@@ -134,6 +135,21 @@  __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
 		seq_printf(m, "\n");
 	}
 
+	if (ctx && (ctx->flags & IORING_SETUP_SQPOLL)) {
+		struct io_sq_data *sq = ctx->sq_data;
+
+		if (sq && sq->total_time != 0)
+			sq_busy = (int)(sq->work_time * 100 / sq->total_time);
+
+		sq_pid = sq->task_pid;
+		sq_cpu = sq->sq_cpu;
+	}
+
+	seq_printf(m, "SqThread:\t%d\n", sq_pid);
+	seq_printf(m, "SqThreadCpu:\t%d\n", sq_cpu);
+	seq_printf(m, "SqBusy:\t%d%%\n", sq_busy);
+	seq_printf(m, "UserFiles:\t%u\n", ctx->nr_user_files);
+
 	/*
 	 * Avoid ABBA deadlock between the seq lock and the io_uring mutex,
 	 * since fdinfo case grabs it in the opposite direction of normal use
@@ -142,21 +158,6 @@  __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
 	 */
 	has_lock = mutex_trylock(&ctx->uring_lock);
 
-	if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
-		struct io_sq_data *sq = ctx->sq_data;
-
-		if (mutex_trylock(&sq->lock)) {
-			if (sq->thread) {
-				sq_pid = task_pid_nr(sq->thread);
-				sq_cpu = task_cpu(sq->thread);
-			}
-			mutex_unlock(&sq->lock);
-		}
-	}
-
-	seq_printf(m, "SqThread:\t%d\n", sq_pid);
-	seq_printf(m, "SqThreadCpu:\t%d\n", sq_cpu);
-	seq_printf(m, "UserFiles:\t%u\n", ctx->nr_user_files);
 	for (i = 0; has_lock && i < ctx->nr_user_files; i++) {
 		struct file *f = io_file_from_index(&ctx->file_table, i);
 
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index bd6c2c7959a5..dc093adc1ce5 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -224,17 +224,21 @@  static int io_sq_thread(void *data)
 	struct io_ring_ctx *ctx;
 	unsigned long timeout = 0;
 	char buf[TASK_COMM_LEN];
+	unsigned long sq_start, sq_work_begin, sq_work_end;
 	DEFINE_WAIT(wait);
 
 	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
 	set_task_comm(current, buf);
 
-	if (sqd->sq_cpu != -1)
+	if (sqd->sq_cpu != -1) {
 		set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
-	else
+	} else {
 		set_cpus_allowed_ptr(current, cpu_online_mask);
+		sqd->sq_cpu = raw_smp_processor_id();
+	}
 
 	mutex_lock(&sqd->lock);
+	sq_start = jiffies;
 	while (1) {
 		bool cap_entries, sqt_spin = false;
 
@@ -245,6 +249,7 @@  static int io_sq_thread(void *data)
 		}
 
 		cap_entries = !list_is_singular(&sqd->ctx_list);
+		sq_work_begin = jiffies;
 		list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
 			int ret = __io_sq_thread(ctx, cap_entries);
 
@@ -254,6 +259,11 @@  static int io_sq_thread(void *data)
 		if (io_run_task_work())
 			sqt_spin = true;
 
+		sq_work_end = jiffies;
+		sqd->total_time = sq_work_end - sq_start;
+		if (sqt_spin == true)
+			sqd->work_time += sq_work_end - sq_work_begin;
+
 		if (sqt_spin || !time_after(jiffies, timeout)) {
 			if (sqt_spin)
 				timeout = jiffies + sqd->sq_thread_idle;
@@ -261,6 +271,7 @@  static int io_sq_thread(void *data)
 				mutex_unlock(&sqd->lock);
 				cond_resched();
 				mutex_lock(&sqd->lock);
+				sqd->sq_cpu = raw_smp_processor_id();
 			}
 			continue;
 		}
@@ -294,6 +305,7 @@  static int io_sq_thread(void *data)
 				mutex_unlock(&sqd->lock);
 				schedule();
 				mutex_lock(&sqd->lock);
+				sqd->sq_cpu = raw_smp_processor_id();
 			}
 			list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
 				atomic_andnot(IORING_SQ_NEED_WAKEUP,
@@ -395,14 +407,14 @@  __cold int io_sq_offload_create(struct io_ring_ctx *ctx,
 			sqd->sq_cpu = -1;
 		}
 
-		sqd->task_pid = current->pid;
-		sqd->task_tgid = current->tgid;
+		sqd->task_tgid = current->pid;
 		tsk = create_io_thread(io_sq_thread, sqd, NUMA_NO_NODE);
 		if (IS_ERR(tsk)) {
 			ret = PTR_ERR(tsk);
 			goto err_sqpoll;
 		}
 
+		sqd->task_pid = task_pid_nr(tsk);
 		sqd->thread = tsk;
 		ret = io_uring_alloc_task_context(tsk, ctx);
 		wake_up_new_task(tsk);
diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h
index 8df37e8c9149..fd6fa9587843 100644
--- a/io_uring/sqpoll.h
+++ b/io_uring/sqpoll.h
@@ -16,6 +16,8 @@  struct io_sq_data {
 	pid_t			task_pid;
 	pid_t			task_tgid;
 
+	unsigned long       work_time;
+	unsigned long       total_time;
 	unsigned long		state;
 	struct completion	exited;
 };