diff mbox series

[12/13] target, vhost-scsi: don't switch cpus on completion

Message ID 20210209123845.4856-13-michael.christie@oracle.com (mailing list archive)
State New, archived
Headers show
Series target: fix cmd plugging and completion | expand

Commit Message

Mike Christie Feb. 9, 2021, 12:38 p.m. UTC
LIO wants to complete a cmd on the CPU it was submitted on, because
most drivers have per cpu or hw queue handlers. But, for vhost-scsi
which has the single thread for submissions and completions this
is not always the best thing to do since the thread could be running
on a different CPU now, and it conflicts with what the user has setup
in the lower levels with settings like the block layer rq_affinity
or for network block devices what the user has setup on their nic.

This patch has vhost-scsi tell LIO to complete the cmd on the CPU the
layer below LIO has completed the cmd on. We then stop fighting
the block, net and whatever layer/setting is below us.

With this patch and the previous ones I see an increase in IOPs by about
50% (234K -> 350K) for random 4K workloads like:

fio --filename=/dev/sda  --direct=1 --rw=randrw --bs=4k
--ioengine=libaio --iodepth=128  --numjobs=8 --time_based
--group_reporting --runtime=60

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/target/target_core_transport.c | 10 +++++++++-
 drivers/vhost/scsi.c                   |  3 ++-
 include/target/target_core_base.h      |  2 ++
 3 files changed, 13 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig Feb. 10, 2021, 8:44 a.m. UTC | #1
>  	struct se_device *se_dev = se_cmd->se_dev;
> -	int cpu = se_cmd->cpuid;
> +	int cpu;
> +
> +	if (se_cmd->se_cmd_flags & SCF_IGNORE_CPUID_COMPL)
> +		cpu = smp_processor_id();
> +	else
> +		cpu = se_cmd->cpuid;
>  
>  	target_queue_cmd_work(&se_dev->queues[cpu].cq, se_cmd, cpu,
>  			      target_completion_wq);

Can't we just use se_cmd->cpuid == WORK_CPU_UNBOUND as the no affinity
indicator, which would remove all branches here.
Mike Christie Feb. 10, 2021, 6:43 p.m. UTC | #2
On 2/10/21 2:44 AM, Christoph Hellwig wrote:
>>  	struct se_device *se_dev = se_cmd->se_dev;
>> -	int cpu = se_cmd->cpuid;
>> +	int cpu;
>> +
>> +	if (se_cmd->se_cmd_flags & SCF_IGNORE_CPUID_COMPL)
>> +		cpu = smp_processor_id();
>> +	else
>> +		cpu = se_cmd->cpuid;
>>  
>>  	target_queue_cmd_work(&se_dev->queues[cpu].cq, se_cmd, cpu,
>>  			      target_completion_wq);
> 
> Can't we just use se_cmd->cpuid == WORK_CPU_UNBOUND as the no affinity
> indicator, which would remove all branches here.

We can't right now because the workqueue_struct does
not have the WQ_UNBOUND bit set. __queue_work will then do:

        /* pwq which will be used unless @work is executing elsewhere */
        if (wq->flags & WQ_UNBOUND) {
                if (req_cpu == WORK_CPU_UNBOUND)
                        cpu = wq_select_unbound_cpu(raw_smp_processor_id());
                pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
        } else {
                if (req_cpu == WORK_CPU_UNBOUND)
                        cpu = raw_smp_processor_id();
                pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
        }

so even if you pass in WORK_CPU_UNBOUND, it will do raw_smp_processor_id
and add the work to the cpu's worker pool.

I think if I add in a new tunable to make the workqueue bound or unbound like I
mentioned in the other thread then I think it will do what you want for both
review comments.
Mike Christie Feb. 10, 2021, 6:45 p.m. UTC | #3
On 2/10/21 12:43 PM, Mike Christie wrote:
> On 2/10/21 2:44 AM, Christoph Hellwig wrote:
>>>  	struct se_device *se_dev = se_cmd->se_dev;
>>> -	int cpu = se_cmd->cpuid;
>>> +	int cpu;
>>> +
>>> +	if (se_cmd->se_cmd_flags & SCF_IGNORE_CPUID_COMPL)
>>> +		cpu = smp_processor_id();
>>> +	else
>>> +		cpu = se_cmd->cpuid;
>>>  
>>>  	target_queue_cmd_work(&se_dev->queues[cpu].cq, se_cmd, cpu,
>>>  			      target_completion_wq);
>>
>> Can't we just use se_cmd->cpuid == WORK_CPU_UNBOUND as the no affinity
>> indicator, which would remove all branches here.
> 
> We can't right now because the workqueue_struct does
> not have the WQ_UNBOUND bit set. __queue_work will then do:
> 
>         /* pwq which will be used unless @work is executing elsewhere */
>         if (wq->flags & WQ_UNBOUND) {
>                 if (req_cpu == WORK_CPU_UNBOUND)
>                         cpu = wq_select_unbound_cpu(raw_smp_processor_id());
>                 pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
>         } else {
>                 if (req_cpu == WORK_CPU_UNBOUND)
>                         cpu = raw_smp_processor_id();
>                 pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
>         }
> 
> so even if you pass in WORK_CPU_UNBOUND, it will do raw_smp_processor_id
> and add the work to the cpu's worker pool.

Nevermind. What I wrote is exactly what you asked for :) I can do that.
diff mbox series

Patch

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index b1f920c1b816..e5e555dcd73a 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -828,7 +828,12 @@  static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
 static void target_queue_cmd_compl(struct se_cmd *se_cmd)
 {
 	struct se_device *se_dev = se_cmd->se_dev;
-	int cpu = se_cmd->cpuid;
+	int cpu;
+
+	if (se_cmd->se_cmd_flags & SCF_IGNORE_CPUID_COMPL)
+		cpu = smp_processor_id();
+	else
+		cpu = se_cmd->cpuid;
 
 	target_queue_cmd_work(&se_dev->queues[cpu].cq, se_cmd, cpu,
 			      target_completion_wq);
@@ -1609,6 +1614,9 @@  target_submit_prep(struct se_cmd *se_cmd, struct se_session *se_sess,
 	BUG_ON(!se_tpg);
 	BUG_ON(se_cmd->se_tfo || se_cmd->se_sess);
 
+	if (flags & TARGET_SCF_IGNORE_CPUID_COMPL)
+		se_cmd->se_cmd_flags |= SCF_IGNORE_CPUID_COMPL;
+
 	if (flags & TARGET_SCF_USE_CPUID)
 		se_cmd->se_cmd_flags |= SCF_USE_CPUID;
 	/*
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 99909c6f3960..b9addd5fdd28 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -803,7 +803,8 @@  static void vhost_scsi_target_queue_cmd(struct vhost_scsi_cmd *cmd)
 			&cmd->tvc_sense_buf[0], cmd->tvc_lun,
 			cmd->tvc_exp_data_len,
 			vhost_scsi_to_tcm_attr(cmd->tvc_task_attr),
-			cmd->tvc_data_direction, TARGET_SCF_ACK_KREF,
+			cmd->tvc_data_direction,
+			TARGET_SCF_ACK_KREF | TARGET_SCF_IGNORE_CPUID_COMPL,
 			sg_ptr, cmd->tvc_sgl_count, NULL, 0, sg_prot_ptr,
 			cmd->tvc_prot_sgl_count);
 	if (rc < 0) {
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index f2ba7de59da7..bb4ac7e6f560 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -146,6 +146,7 @@  enum se_cmd_flags_table {
 	SCF_USE_CPUID				= (1 << 16),
 	SCF_TASK_ATTR_SET			= (1 << 17),
 	SCF_TREAT_READ_AS_NORMAL		= (1 << 18),
+	SCF_IGNORE_CPUID_COMPL			= (1 << 19)
 };
 
 /*
@@ -195,6 +196,7 @@  enum target_sc_flags_table {
 	TARGET_SCF_ACK_KREF		= 0x02,
 	TARGET_SCF_UNKNOWN_SIZE		= 0x04,
 	TARGET_SCF_USE_CPUID		= 0x08,
+	TARGET_SCF_IGNORE_CPUID_COMPL	= 0x10,
 };
 
 /* fabric independent task management function values */