diff mbox series

[v2] io_uring/fdinfo: remove need for sqpoll lock for thread/pid retrieval

Message ID 202966f7-3e79-4913-a7db-6b2fc230dda7@kernel.dk (mailing list archive)
State New
Headers show
Series [v2] io_uring/fdinfo: remove need for sqpoll lock for thread/pid retrieval | expand

Commit Message

Jens Axboe Nov. 14, 2023, 5:09 p.m. UTC
A previous commit added a trylock for getting the SQPOLL thread info via
fdinfo, but this introduced a regression where we often fail to get it if
the thread is busy. For that case, we end up not printing the current CPU
and PID info.

Rather than rely on this lock, just print the pid we already stored in
the io_sq_data struct, and ensure we update the current CPU every time we
are going to sleep. The latter won't potentially be 100% accurate, but
that wasn't the case before either as the task can get migrated at any
time unless it has been pinned at creation time.

We retain keeping the io_sq_data dereference inside the ctx->uring_lock,
as it has always been, as destruction of the thread and data happen below
that. We could make this RCU safe, but there's little point in doing that.

With this, we always print the last valid information we had, rather than
have spurious outputs with missing information.

Fixes: 7644b1a1c9a7 ("io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

v2: actually remember to use the cached values... also update ->sq_cpu
    when we initially set it up, if it's not pinned to a given CPU.

Comments

Gabriel Krisman Bertazi Nov. 14, 2023, 10:39 p.m. UTC | #1
Jens Axboe <axboe@kernel.dk> writes:

> A previous commit added a trylock for getting the SQPOLL thread info via
> fdinfo, but this introduced a regression where we often fail to get it if
> the thread is busy. For that case, we end up not printing the current CPU
> and PID info.
>
> Rather than rely on this lock, just print the pid we already stored in
> the io_sq_data struct, and ensure we update the current CPU every time we
> are going to sleep. The latter won't potentially be 100% accurate, but
> that wasn't the case before either as the task can get migrated at any
> time unless it has been pinned at creation time.
>
> We retain keeping the io_sq_data dereference inside the ctx->uring_lock,
> as it has always been, as destruction of the thread and data happen below
> that. We could make this RCU safe, but there's little point in doing that.
>
> With this, we always print the last valid information we had, rather than
> have spurious outputs with missing information.
>
> Fixes: 7644b1a1c9a7 ("io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid")
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
> ---
>
> v2: actually remember to use the cached values... also update ->sq_cpu
>     when we initially set it up, if it's not pinned to a given CPU.
>
> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
> index f04a43044d91..976e9500f651 100644
> --- a/io_uring/fdinfo.c
> +++ b/io_uring/fdinfo.c
> @@ -145,13 +145,8 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
>  	if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
>  		struct io_sq_data *sq = ctx->sq_data;
>  
> -		if (mutex_trylock(&sq->lock)) {
> -			if (sq->thread) {
> -				sq_pid = task_pid_nr(sq->thread);
> -				sq_cpu = task_cpu(sq->thread);
> -			}
> -			mutex_unlock(&sq->lock);
> -		}
> +		sq_pid = sq->task_pid;
> +		sq_cpu = sq->sq_cpu;
>  	}
>  
>  	seq_printf(m, "SqThread:\t%d\n", sq_pid);
> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
> index bd6c2c7959a5..ecb00322a4e5 100644
> --- a/io_uring/sqpoll.c
> +++ b/io_uring/sqpoll.c
> @@ -229,10 +229,12 @@ static int io_sq_thread(void *data)
>  	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
>  	set_task_comm(current, buf);
>  
> -	if (sqd->sq_cpu != -1)
> +	if (sqd->sq_cpu != -1) {
>  		set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
> -	else
> +	} else {
>  		set_cpus_allowed_ptr(current, cpu_online_mask);
> +		sqd->sq_cpu = task_cpu(current);
> +	}
>  
>  	mutex_lock(&sqd->lock);
>  	while (1) {
> @@ -291,6 +293,7 @@ static int io_sq_thread(void *data)
>  			}
>  
>  			if (needs_sched) {
> +				sqd->sq_cpu = task_cpu(current);

Don't you also need to update sqd->sq_cpu in io_sqd_handle_event before
releasing the lock?  sqpoll might get migrated after the following
schedule and then parked, in which case sqd->sq_cpu will not be up to
date for a while.

Other than that, I think the patch is fine.

>  				mutex_unlock(&sqd->lock);
>  				schedule();
>  				mutex_lock(&sqd->lock);
Jens Axboe Nov. 15, 2023, 2:34 a.m. UTC | #2
On 11/14/23 3:39 PM, Gabriel Krisman Bertazi wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
>> A previous commit added a trylock for getting the SQPOLL thread info via
>> fdinfo, but this introduced a regression where we often fail to get it if
>> the thread is busy. For that case, we end up not printing the current CPU
>> and PID info.
>>
>> Rather than rely on this lock, just print the pid we already stored in
>> the io_sq_data struct, and ensure we update the current CPU every time we
>> are going to sleep. The latter won't potentially be 100% accurate, but
>> that wasn't the case before either as the task can get migrated at any
>> time unless it has been pinned at creation time.
>>
>> We retain keeping the io_sq_data dereference inside the ctx->uring_lock,
>> as it has always been, as destruction of the thread and data happen below
>> that. We could make this RCU safe, but there's little point in doing that.
>>
>> With this, we always print the last valid information we had, rather than
>> have spurious outputs with missing information.
>>
>> Fixes: 7644b1a1c9a7 ("io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid")
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> ---
>>
>> v2: actually remember to use the cached values... also update ->sq_cpu
>>     when we initially set it up, if it's not pinned to a given CPU.
>>
>> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
>> index f04a43044d91..976e9500f651 100644
>> --- a/io_uring/fdinfo.c
>> +++ b/io_uring/fdinfo.c
>> @@ -145,13 +145,8 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
>>  	if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
>>  		struct io_sq_data *sq = ctx->sq_data;
>>  
>> -		if (mutex_trylock(&sq->lock)) {
>> -			if (sq->thread) {
>> -				sq_pid = task_pid_nr(sq->thread);
>> -				sq_cpu = task_cpu(sq->thread);
>> -			}
>> -			mutex_unlock(&sq->lock);
>> -		}
>> +		sq_pid = sq->task_pid;
>> +		sq_cpu = sq->sq_cpu;
>>  	}
>>  
>>  	seq_printf(m, "SqThread:\t%d\n", sq_pid);
>> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
>> index bd6c2c7959a5..ecb00322a4e5 100644
>> --- a/io_uring/sqpoll.c
>> +++ b/io_uring/sqpoll.c
>> @@ -229,10 +229,12 @@ static int io_sq_thread(void *data)
>>  	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
>>  	set_task_comm(current, buf);
>>  
>> -	if (sqd->sq_cpu != -1)
>> +	if (sqd->sq_cpu != -1) {
>>  		set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
>> -	else
>> +	} else {
>>  		set_cpus_allowed_ptr(current, cpu_online_mask);
>> +		sqd->sq_cpu = task_cpu(current);
>> +	}
>>  
>>  	mutex_lock(&sqd->lock);
>>  	while (1) {
>> @@ -291,6 +293,7 @@ static int io_sq_thread(void *data)
>>  			}
>>  
>>  			if (needs_sched) {
>> +				sqd->sq_cpu = task_cpu(current);
> 
> Don't you also need to update sqd->sq_cpu in io_sqd_handle_event before
> releasing the lock?  sqpoll might get migrated after the following
> schedule and then parked, in which case sqd->sq_cpu will not be up to
> date for a while.
> 
> Other than that, I think the patch is fine.

We probably should, and I also forgot that we can just use
raw_smp_processor_id() at this point. I'll send out a v3.
diff mbox series

Patch

diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index f04a43044d91..976e9500f651 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -145,13 +145,8 @@  __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
 	if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
 		struct io_sq_data *sq = ctx->sq_data;
 
-		if (mutex_trylock(&sq->lock)) {
-			if (sq->thread) {
-				sq_pid = task_pid_nr(sq->thread);
-				sq_cpu = task_cpu(sq->thread);
-			}
-			mutex_unlock(&sq->lock);
-		}
+		sq_pid = sq->task_pid;
+		sq_cpu = sq->sq_cpu;
 	}
 
 	seq_printf(m, "SqThread:\t%d\n", sq_pid);
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index bd6c2c7959a5..ecb00322a4e5 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -229,10 +229,12 @@  static int io_sq_thread(void *data)
 	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
 	set_task_comm(current, buf);
 
-	if (sqd->sq_cpu != -1)
+	if (sqd->sq_cpu != -1) {
 		set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
-	else
+	} else {
 		set_cpus_allowed_ptr(current, cpu_online_mask);
+		sqd->sq_cpu = task_cpu(current);
+	}
 
 	mutex_lock(&sqd->lock);
 	while (1) {
@@ -291,6 +293,7 @@  static int io_sq_thread(void *data)
 			}
 
 			if (needs_sched) {
+				sqd->sq_cpu = task_cpu(current);
 				mutex_unlock(&sqd->lock);
 				schedule();
 				mutex_lock(&sqd->lock);