From patchwork Mon May 27 15:02:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10963323 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 10E57112C for ; Mon, 27 May 2019 15:02:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F3F5426E3E for ; Mon, 27 May 2019 15:02:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E817A28481; Mon, 27 May 2019 15:02:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4EA8E286DF for ; Mon, 27 May 2019 15:02:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726302AbfE0PCs (ORCPT ); Mon, 27 May 2019 11:02:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38980 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726115AbfE0PCs (ORCPT ); Mon, 27 May 2019 11:02:48 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DD86D3092663; Mon, 27 May 2019 15:02:42 +0000 (UTC) Received: from localhost (ovpn-8-24.pek2.redhat.com [10.72.8.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3C63A7D90B; Mon, 27 May 2019 15:02:35 +0000 (UTC) From: Ming Lei To: Jens Axboe , "Martin K . Petersen" Cc: linux-block@vger.kernel.org, James Bottomley , linux-scsi@vger.kernel.org, Bart Van Assche , Hannes Reinecke , John Garry , Keith Busch , Thomas Gleixner , Don Brace , Kashyap Desai , Sathya Prakash , Christoph Hellwig , Ming Lei Subject: [PATCH V2 1/5] scsi: select reply queue from request's CPU Date: Mon, 27 May 2019 23:02:03 +0800 Message-Id: <20190527150207.11372-2-ming.lei@redhat.com> In-Reply-To: <20190527150207.11372-1-ming.lei@redhat.com> References: <20190527150207.11372-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Mon, 27 May 2019 15:02:48 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hisi_sas_v3_hw, hpsa, megaraid and mpt3sas use single blk-mq hw queue to submit request, meantime apply multiple private reply queues served as completion queue. The mapping between CPU and reply queue is setup via pci_alloc_irq_vectors_affinity(PCI_IRQ_AFFINITY) just like the usual blk-mq queue mapping. These drivers always use current CPU(raw_smp_processor_id) to figure out the reply queue. Switch to use request's CPU to get the reply queue, so we can drain in-flight request via blk-mq's API before the last CPU of the reply queue becomes offline. Signed-off-by: Ming Lei Reviewed-by: Hannes Reinecke --- drivers/scsi/hisi_sas/hisi_sas_main.c | 5 +++-- drivers/scsi/hpsa.c | 2 +- drivers/scsi/megaraid/megaraid_sas_fusion.c | 4 ++-- drivers/scsi/mpt3sas/mpt3sas_base.c | 16 ++++++++-------- include/scsi/scsi_cmnd.h | 11 +++++++++++ 5 files changed, 25 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index 8a7feb8ed8d6..ab9d8e7bfc8e 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -471,9 +471,10 @@ static int hisi_sas_task_prep(struct sas_task *task, return -ECOMM; } + /* only V3 hardware setup .reply_map */ if (hisi_hba->reply_map) { - int cpu = raw_smp_processor_id(); - unsigned int dq_index = hisi_hba->reply_map[cpu]; + unsigned int dq_index = hisi_hba->reply_map[ + scsi_cmnd_cpu(task->uldd_task)]; *dq_pointer = dq = &hisi_hba->dq[dq_index]; } else { diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index 1bef1da273c2..72f9edb86752 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -1145,7 +1145,7 @@ static void __enqueue_cmd_and_start_io(struct ctlr_info *h, dial_down_lockup_detection_during_fw_flash(h, c); atomic_inc(&h->commands_outstanding); - reply_queue = h->reply_map[raw_smp_processor_id()]; + reply_queue = h->reply_map[scsi_cmnd_cpu(c->scsi_cmd)]; switch (c->cmd_type) { case CMD_IOACCEL1: set_ioaccel1_performant_mode(h, c, reply_queue); diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c index 4dfa0685a86c..6bed77cfaf9a 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c @@ -2699,7 +2699,7 @@ megasas_build_ldio_fusion(struct megasas_instance *instance, } cmd->request_desc->SCSIIO.MSIxIndex = - instance->reply_map[raw_smp_processor_id()]; + instance->reply_map[scsi_cmnd_cpu(scp)]; if (instance->adapter_type >= VENTURA_SERIES) { /* FP for Optimal raid level 1. @@ -3013,7 +3013,7 @@ megasas_build_syspd_fusion(struct megasas_instance *instance, cmd->request_desc->SCSIIO.DevHandle = io_request->DevHandle; cmd->request_desc->SCSIIO.MSIxIndex = - instance->reply_map[raw_smp_processor_id()]; + instance->reply_map[scsi_cmnd_cpu(scmd)]; if (!fp_possible) { /* system pd firmware path */ diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 8aacbd1e7db2..8135e980f591 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -3266,7 +3266,7 @@ mpt3sas_base_get_reply_virt_addr(struct MPT3SAS_ADAPTER *ioc, u32 phys_addr) } static inline u8 -_base_get_msix_index(struct MPT3SAS_ADAPTER *ioc) +_base_get_msix_index(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd) { /* Enables reply_queue load balancing */ if (ioc->msix_load_balance) @@ -3274,7 +3274,7 @@ _base_get_msix_index(struct MPT3SAS_ADAPTER *ioc) base_mod64(atomic64_add_return(1, &ioc->total_io_cnt), ioc->reply_queue_count) : 0; - return ioc->cpu_msix_table[raw_smp_processor_id()]; + return ioc->cpu_msix_table[scsi_cmnd_cpu(scmd)]; } /** @@ -3325,7 +3325,7 @@ mpt3sas_base_get_smid_scsiio(struct MPT3SAS_ADAPTER *ioc, u8 cb_idx, smid = tag + 1; request->cb_idx = cb_idx; - request->msix_io = _base_get_msix_index(ioc); + request->msix_io = _base_get_msix_index(ioc, scmd); request->smid = smid; INIT_LIST_HEAD(&request->chain_list); return smid; @@ -3498,7 +3498,7 @@ _base_put_smid_mpi_ep_scsi_io(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 handle) _base_clone_mpi_to_sys_mem(mpi_req_iomem, (void *)mfp, ioc->request_sz); descriptor.SCSIIO.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO; - descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc); + descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc, NULL); descriptor.SCSIIO.SMID = cpu_to_le16(smid); descriptor.SCSIIO.DevHandle = cpu_to_le16(handle); descriptor.SCSIIO.LMID = 0; @@ -3520,7 +3520,7 @@ _base_put_smid_scsi_io(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 handle) descriptor.SCSIIO.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO; - descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc); + descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc, NULL); descriptor.SCSIIO.SMID = cpu_to_le16(smid); descriptor.SCSIIO.DevHandle = cpu_to_le16(handle); descriptor.SCSIIO.LMID = 0; @@ -3543,7 +3543,7 @@ mpt3sas_base_put_smid_fast_path(struct MPT3SAS_ADAPTER *ioc, u16 smid, descriptor.SCSIIO.RequestFlags = MPI25_REQ_DESCRIPT_FLAGS_FAST_PATH_SCSI_IO; - descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc); + descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc, NULL); descriptor.SCSIIO.SMID = cpu_to_le16(smid); descriptor.SCSIIO.DevHandle = cpu_to_le16(handle); descriptor.SCSIIO.LMID = 0; @@ -3607,7 +3607,7 @@ mpt3sas_base_put_smid_nvme_encap(struct MPT3SAS_ADAPTER *ioc, u16 smid) descriptor.Default.RequestFlags = MPI26_REQ_DESCRIPT_FLAGS_PCIE_ENCAPSULATED; - descriptor.Default.MSIxIndex = _base_get_msix_index(ioc); + descriptor.Default.MSIxIndex = _base_get_msix_index(ioc, NULL); descriptor.Default.SMID = cpu_to_le16(smid); descriptor.Default.LMID = 0; descriptor.Default.DescriptorTypeDependent = 0; @@ -3639,7 +3639,7 @@ mpt3sas_base_put_smid_default(struct MPT3SAS_ADAPTER *ioc, u16 smid) } request = (u64 *)&descriptor; descriptor.Default.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_DEFAULT_TYPE; - descriptor.Default.MSIxIndex = _base_get_msix_index(ioc); + descriptor.Default.MSIxIndex = _base_get_msix_index(ioc, NULL); descriptor.Default.SMID = cpu_to_le16(smid); descriptor.Default.LMID = 0; descriptor.Default.DescriptorTypeDependent = 0; diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h index 76ed5e4acd38..ab60883c2c40 100644 --- a/include/scsi/scsi_cmnd.h +++ b/include/scsi/scsi_cmnd.h @@ -332,4 +332,15 @@ static inline unsigned scsi_transfer_length(struct scsi_cmnd *scmd) return xfer_len; } +static inline int scsi_cmnd_cpu(struct scsi_cmnd *scmd) +{ + if (!scmd || !scmd->request) + return raw_smp_processor_id(); + + if (!scmd->request->mq_ctx) + return raw_smp_processor_id(); + + return blk_mq_rq_cpu(scmd->request); +} + #endif /* _SCSI_SCSI_CMND_H */ From patchwork Mon May 27 15:02:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10963327 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC773112C for ; Mon, 27 May 2019 15:02:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DC4DE28724 for ; Mon, 27 May 2019 15:02:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D0B3F28732; Mon, 27 May 2019 15:02:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7FFAB28724 for ; Mon, 27 May 2019 15:02:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726506AbfE0PCw (ORCPT ); Mon, 27 May 2019 11:02:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:64810 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726115AbfE0PCv (ORCPT ); Mon, 27 May 2019 11:02:51 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3BE4A30C1AFA; Mon, 27 May 2019 15:02:51 +0000 (UTC) Received: from localhost (ovpn-8-24.pek2.redhat.com [10.72.8.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2BD3E60BF3; Mon, 27 May 2019 15:02:44 +0000 (UTC) From: Ming Lei To: Jens Axboe , "Martin K . Petersen" Cc: linux-block@vger.kernel.org, James Bottomley , linux-scsi@vger.kernel.org, Bart Van Assche , Hannes Reinecke , John Garry , Keith Busch , Thomas Gleixner , Don Brace , Kashyap Desai , Sathya Prakash , Christoph Hellwig , Ming Lei Subject: [PATCH V2 2/5] blk-mq: introduce .complete_queue_affinity Date: Mon, 27 May 2019 23:02:04 +0800 Message-Id: <20190527150207.11372-3-ming.lei@redhat.com> In-Reply-To: <20190527150207.11372-1-ming.lei@redhat.com> References: <20190527150207.11372-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Mon, 27 May 2019 15:02:51 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Some SCSI devices support single hw queue(tags), meantime allow multiple private complete queues for handling request delivery & completion. And mapping between CPU and private completion queue is setup via pci_alloc_irq_vectors_affinity(PCI_IRQ_AFFINITY), just like normal blk-mq's queue mapping. Introduce .complete_queue_affinity callback for getting the complete queue's affinity, so that we can drain in-flight requests delivered from the complete queue if last CPU of the completion queue becomes offline. Signed-off-by: Ming Lei --- include/linux/blk-mq.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 15d1aa53d96c..56f2e2ed62a7 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -140,7 +140,8 @@ typedef int (poll_fn)(struct blk_mq_hw_ctx *); typedef int (map_queues_fn)(struct blk_mq_tag_set *set); typedef bool (busy_fn)(struct request_queue *); typedef void (complete_fn)(struct request *); - +typedef const struct cpumask *(hctx_complete_queue_affinity_fn)( + struct blk_mq_hw_ctx *, int); struct blk_mq_ops { /* @@ -207,6 +208,15 @@ struct blk_mq_ops { map_queues_fn *map_queues; + /* + * Some SCSI devices support private complete queue, returns + * affinity of the complete queue, and the passed 'cpu' parameter + * has to be included in the complete queue's affinity cpumask, and + * used to figure out the mapped reply queue. If NULL is returns, + * it means this hctx hasn't private completion queues. + */ + hctx_complete_queue_affinity_fn *complete_queue_affinity; + #ifdef CONFIG_BLK_DEBUG_FS /* * Used by the debugfs implementation to show driver-specific From patchwork Mon May 27 15:02:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10963333 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A6761575 for ; Mon, 27 May 2019 15:03:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4BC362239C for ; Mon, 27 May 2019 15:03:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3FEEB26E3E; Mon, 27 May 2019 15:03:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C29EB1FF65 for ; Mon, 27 May 2019 15:03:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725991AbfE0PDA (ORCPT ); Mon, 27 May 2019 11:03:00 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43886 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726704AbfE0PC5 (ORCPT ); Mon, 27 May 2019 11:02:57 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 291C23001A63; Mon, 27 May 2019 15:02:57 +0000 (UTC) Received: from localhost (ovpn-8-24.pek2.redhat.com [10.72.8.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id AA00679805; Mon, 27 May 2019 15:02:53 +0000 (UTC) From: Ming Lei To: Jens Axboe , "Martin K . Petersen" Cc: linux-block@vger.kernel.org, James Bottomley , linux-scsi@vger.kernel.org, Bart Van Assche , Hannes Reinecke , John Garry , Keith Busch , Thomas Gleixner , Don Brace , Kashyap Desai , Sathya Prakash , Christoph Hellwig , Ming Lei Subject: [PATCH V2 3/5] scsi: core: implement callback of .complete_queue_affinity Date: Mon, 27 May 2019 23:02:05 +0800 Message-Id: <20190527150207.11372-4-ming.lei@redhat.com> In-Reply-To: <20190527150207.11372-1-ming.lei@redhat.com> References: <20190527150207.11372-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Mon, 27 May 2019 15:02:57 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Implement scsi core's callback of .complete_queue_affinity for supporting to drain in-flight requests in case that SCSI HBA supports multiple completion queue. Signed-off-by: Ming Lei --- drivers/scsi/scsi_lib.c | 14 ++++++++++++++ include/scsi/scsi_host.h | 10 ++++++++++ 2 files changed, 24 insertions(+) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 65d0a10c76ad..ac57dc98a8c0 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1750,6 +1750,19 @@ static int scsi_map_queues(struct blk_mq_tag_set *set) return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]); } +static const struct cpumask * +scsi_complete_queue_affinity(struct blk_mq_hw_ctx *hctx, int cpu) +{ + struct request_queue *q = hctx->queue; + struct scsi_device *sdev = q->queuedata; + struct Scsi_Host *shost = sdev->host; + + if (shost->hostt->complete_queue_affinity) + return shost->hostt->complete_queue_affinity(shost, cpu); + + return NULL; +} + void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q) { struct device *dev = shost->dma_dev; @@ -1802,6 +1815,7 @@ static const struct blk_mq_ops scsi_mq_ops = { .initialize_rq_fn = scsi_initialize_rq, .busy = scsi_mq_lld_busy, .map_queues = scsi_map_queues, + .complete_queue_affinity = scsi_complete_queue_affinity, }; struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev) diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index a5fcdad4a03e..65ccac1429a1 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -268,6 +268,16 @@ struct scsi_host_template { */ int (* map_queues)(struct Scsi_Host *shost); + /* + * This functions lets the driver expose complete queue's cpu + * affinity to the block layer. @cpu is used for retrieving + * the mapped completion queue. + * + * Status: OPTIONAL + */ + const struct cpumask * (* complete_queue_affinity)(struct Scsi_Host *, + int cpu); + /* * This function determines the BIOS parameters for a given * harddisk. These tend to be numbers that are made up by From patchwork Mon May 27 15:02:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10963335 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 95358112C for ; Mon, 27 May 2019 15:03:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 85FBE26E4A for ; Mon, 27 May 2019 15:03:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7A16728481; Mon, 27 May 2019 15:03:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 02AF926E4A for ; Mon, 27 May 2019 15:03:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726884AbfE0PDG (ORCPT ); Mon, 27 May 2019 11:03:06 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38168 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726704AbfE0PDG (ORCPT ); Mon, 27 May 2019 11:03:06 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 44B88307D85B; Mon, 27 May 2019 15:03:05 +0000 (UTC) Received: from localhost (ovpn-8-24.pek2.redhat.com [10.72.8.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 68AA27D5A4; Mon, 27 May 2019 15:02:59 +0000 (UTC) From: Ming Lei To: Jens Axboe , "Martin K . Petersen" Cc: linux-block@vger.kernel.org, James Bottomley , linux-scsi@vger.kernel.org, Bart Van Assche , Hannes Reinecke , John Garry , Keith Busch , Thomas Gleixner , Don Brace , Kashyap Desai , Sathya Prakash , Christoph Hellwig , Ming Lei Subject: [PATCH V2 4/5] scsi: implement .complete_queue_affinity Date: Mon, 27 May 2019 23:02:06 +0800 Message-Id: <20190527150207.11372-5-ming.lei@redhat.com> In-Reply-To: <20190527150207.11372-1-ming.lei@redhat.com> References: <20190527150207.11372-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Mon, 27 May 2019 15:03:05 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Implement .complete_queue_affinity callback for all in-tree drivers which support private completion queues. Signed-off-by: Ming Lei --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 11 +++++++++++ drivers/scsi/hpsa.c | 12 ++++++++++++ drivers/scsi/megaraid/megaraid_sas_base.c | 10 ++++++++++ drivers/scsi/mpt3sas/mpt3sas_scsih.c | 11 +++++++++++ 4 files changed, 44 insertions(+) diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 49620c2411df..799ee15c8786 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -2896,6 +2896,16 @@ static void debugfs_snapshot_restore_v3_hw(struct hisi_hba *hisi_hba) clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags); } +static const struct cpumask * +hisi_sas_complete_queue_affinity(struct Scsi_Host *sh, int cpu) +{ + struct hisi_hba *hisi_hba = shost_priv(sh); + unsigned reply_queue = hisi_hba->reply_map[cpu]; + + return pci_irq_get_affinity(hisi_hba->pci_dev, + reply_queue + BASE_VECTORS_V3_HW); +} + static struct scsi_host_template sht_v3_hw = { .name = DRV_NAME, .module = THIS_MODULE, @@ -2917,6 +2927,7 @@ static struct scsi_host_template sht_v3_hw = { .shost_attrs = host_attrs_v3_hw, .tag_alloc_policy = BLK_TAG_ALLOC_RR, .host_reset = hisi_sas_host_reset, + .complete_queue_affinity = hisi_sas_complete_queue_affinity, }; static const struct hisi_sas_hw hisi_sas_v3_hw = { diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index 72f9edb86752..87d37f945c76 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -271,6 +271,8 @@ static void hpsa_free_cmd_pool(struct ctlr_info *h); #define VPD_PAGE (1 << 8) #define HPSA_SIMPLE_ERROR_BITS 0x03 +static const struct cpumask *hpsa_complete_queue_affinity( + struct Scsi_Host *, int); static int hpsa_scsi_queue_command(struct Scsi_Host *h, struct scsi_cmnd *cmd); static void hpsa_scan_start(struct Scsi_Host *); static int hpsa_scan_finished(struct Scsi_Host *sh, @@ -962,6 +964,7 @@ static struct scsi_host_template hpsa_driver_template = { .name = HPSA, .proc_name = HPSA, .queuecommand = hpsa_scsi_queue_command, + .complete_queue_affinity = hpsa_complete_queue_affinity, .scan_start = hpsa_scan_start, .scan_finished = hpsa_scan_finished, .change_queue_depth = hpsa_change_queue_depth, @@ -4824,6 +4827,15 @@ static int hpsa_scsi_ioaccel_direct_map(struct ctlr_info *h, cmd->cmnd, cmd->cmd_len, dev->scsi3addr, dev); } +static const struct cpumask * +hpsa_complete_queue_affinity(struct Scsi_Host *sh, int cpu) +{ + struct ctlr_info *h = shost_to_hba(sh); + unsigned reply_queue = h->reply_map[cpu]; + + return pci_irq_get_affinity(h->pdev, reply_queue); +} + /* * Set encryption parameters for the ioaccel2 request */ diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 3dd1df472dc6..59b71e8f98a8 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -3165,6 +3165,15 @@ megasas_fw_cmds_outstanding_show(struct device *cdev, return snprintf(buf, PAGE_SIZE, "%d\n", atomic_read(&instance->fw_outstanding)); } +static const struct cpumask * +megasas_complete_queue_affinity(struct Scsi_Host *sh, int cpu) +{ + struct megasas_instance *instance = (struct megasas_instance *)sh->hostdata; + unsigned reply_queue = instance->reply_map[cpu]; + + return pci_irq_get_affinity(instance->pdev, reply_queue); +} + static DEVICE_ATTR(fw_crash_buffer, S_IRUGO | S_IWUSR, megasas_fw_crash_buffer_show, megasas_fw_crash_buffer_store); static DEVICE_ATTR(fw_crash_buffer_size, S_IRUGO, @@ -3208,6 +3217,7 @@ static struct scsi_host_template megasas_template = { .bios_param = megasas_bios_param, .change_queue_depth = scsi_change_queue_depth, .no_write_same = 1, + .complete_queue_affinity = megasas_complete_queue_affinity, }; /** diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 1ccfbc7eebe0..2db1d6fc4bda 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -10161,6 +10161,15 @@ scsih_scan_finished(struct Scsi_Host *shost, unsigned long time) return 1; } +static const struct cpumask * +mpt3sas_complete_queue_affinity(struct Scsi_Host *sh, int cpu) +{ + struct MPT3SAS_ADAPTER *ioc = shost_priv(sh); + unsigned reply_queue = ioc->cpu_msix_table[cpu]; + + return pci_irq_get_affinity(ioc->pdev, reply_queue); +} + /* shost template for SAS 2.0 HBA devices */ static struct scsi_host_template mpt2sas_driver_template = { .module = THIS_MODULE, @@ -10189,6 +10198,7 @@ static struct scsi_host_template mpt2sas_driver_template = { .sdev_attrs = mpt3sas_dev_attrs, .track_queue_depth = 1, .cmd_size = sizeof(struct scsiio_tracker), + .complete_queue_affinity = mpt3sas_complete_queue_affinity, }; /* raid transport support for SAS 2.0 HBA devices */ @@ -10227,6 +10237,7 @@ static struct scsi_host_template mpt3sas_driver_template = { .sdev_attrs = mpt3sas_dev_attrs, .track_queue_depth = 1, .cmd_size = sizeof(struct scsiio_tracker), + .complete_queue_affinity = mpt3sas_complete_queue_affinity, }; /* raid transport support for SAS 3.0 HBA devices */ From patchwork Mon May 27 15:02:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10963339 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A74C11395 for ; Mon, 27 May 2019 15:03:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9741B27F7F for ; Mon, 27 May 2019 15:03:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8A388286DF; Mon, 27 May 2019 15:03:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 015C627F7F for ; Mon, 27 May 2019 15:03:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726910AbfE0PDN (ORCPT ); Mon, 27 May 2019 11:03:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52818 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726704AbfE0PDN (ORCPT ); Mon, 27 May 2019 11:03:13 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D2BD53091D67; Mon, 27 May 2019 15:03:12 +0000 (UTC) Received: from localhost (ovpn-8-24.pek2.redhat.com [10.72.8.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 173511017E39; Mon, 27 May 2019 15:03:07 +0000 (UTC) From: Ming Lei To: Jens Axboe , "Martin K . Petersen" Cc: linux-block@vger.kernel.org, James Bottomley , linux-scsi@vger.kernel.org, Bart Van Assche , Hannes Reinecke , John Garry , Keith Busch , Thomas Gleixner , Don Brace , Kashyap Desai , Sathya Prakash , Christoph Hellwig , Ming Lei Subject: [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug Date: Mon, 27 May 2019 23:02:07 +0800 Message-Id: <20190527150207.11372-6-ming.lei@redhat.com> In-Reply-To: <20190527150207.11372-1-ming.lei@redhat.com> References: <20190527150207.11372-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Mon, 27 May 2019 15:03:13 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Managed interrupts can not migrate affinity when their CPUs are offline. If the CPU is allowed to shutdown before they're returned, commands dispatched to managed queues won't be able to complete through their irq handlers. Wait in cpu hotplug handler until all inflight requests on the tags are completed or timeout. Wait once for each tags, so we can save time in case of shared tags. Based on the following patch from Keith, and use simple delay-spin instead. https://lore.kernel.org/linux-block/20190405215920.27085-1-keith.busch@intel.com/ Some SCSI devices may have single blk_mq hw queue and multiple private completion queues, and wait until all requests on the private completion queue are completed. Signed-off-by: Ming Lei --- block/blk-mq-tag.c | 2 +- block/blk-mq-tag.h | 5 +++ block/blk-mq.c | 94 ++++++++++++++++++++++++++++++++++++++++++---- 3 files changed, 93 insertions(+), 8 deletions(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 7513c8eaabee..b24334f99c5d 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -332,7 +332,7 @@ static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt, * true to continue iterating tags, false to stop. * @priv: Will be passed as second argument to @fn. */ -static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags, +void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, void *priv) { if (tags->nr_reserved_tags) diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index 61deab0b5a5a..9ce7606a87f0 100644 --- a/block/blk-mq-tag.h +++ b/block/blk-mq-tag.h @@ -19,6 +19,9 @@ struct blk_mq_tags { struct request **rqs; struct request **static_rqs; struct list_head page_list; + +#define BLK_MQ_TAGS_DRAINED 0 + unsigned long flags; }; @@ -35,6 +38,8 @@ extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx, extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool); void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, void *priv); +void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags, + busy_tag_iter_fn *fn, void *priv); static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt, struct blk_mq_hw_ctx *hctx) diff --git a/block/blk-mq.c b/block/blk-mq.c index 32b8ad3d341b..ab1fbfd48374 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2215,6 +2215,65 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, return -ENOMEM; } +static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node *node) +{ + struct blk_mq_hw_ctx *hctx = + hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead); + + if (hctx->tags) + clear_bit(BLK_MQ_TAGS_DRAINED, &hctx->tags->flags); + + return 0; +} + +struct blk_mq_inflight_rq_data { + unsigned cnt; + const struct cpumask *cpumask; +}; + +static bool blk_mq_count_inflight_rq(struct request *rq, void *data, + bool reserved) +{ + struct blk_mq_inflight_rq_data *count = data; + + if ((blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) && + cpumask_test_cpu(blk_mq_rq_cpu(rq), count->cpumask)) + count->cnt++; + + return true; +} + +unsigned blk_mq_tags_inflight_rqs(struct blk_mq_tags *tags, + const struct cpumask *completion_cpus) +{ + struct blk_mq_inflight_rq_data data = { + .cnt = 0, + .cpumask = completion_cpus, + }; + + blk_mq_all_tag_busy_iter(tags, blk_mq_count_inflight_rq, &data); + + return data.cnt; +} + +static void blk_mq_drain_inflight_rqs(struct blk_mq_tags *tags, + const struct cpumask *completion_cpus) +{ + if (!tags) + return; + + /* Can't apply the optimization in case of private completion queues */ + if (completion_cpus == cpu_all_mask && + test_and_set_bit(BLK_MQ_TAGS_DRAINED, &tags->flags)) + return; + + while (1) { + if (!blk_mq_tags_inflight_rqs(tags, completion_cpus)) + break; + msleep(5); + } +} + /* * 'cpu' is going away. splice any existing rq_list entries from this * software queue to the hw queue dispatch list, and ensure that it @@ -2226,6 +2285,8 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) struct blk_mq_ctx *ctx; LIST_HEAD(tmp); enum hctx_type type; + struct request_queue *q; + const struct cpumask *cpumask = NULL, *completion_cpus; hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead); ctx = __blk_mq_get_ctx(hctx->queue, cpu); @@ -2238,14 +2299,32 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) } spin_unlock(&ctx->lock); - if (list_empty(&tmp)) - return 0; + if (!list_empty(&tmp)) { + spin_lock(&hctx->lock); + list_splice_tail_init(&tmp, &hctx->dispatch); + spin_unlock(&hctx->lock); - spin_lock(&hctx->lock); - list_splice_tail_init(&tmp, &hctx->dispatch); - spin_unlock(&hctx->lock); + blk_mq_run_hw_queue(hctx, true); + } + + /* + * Interrupt for the current completion queue will be shutdown, so + * wait until all requests on this queue are completed. + */ + q = hctx->queue; + if (q->mq_ops->complete_queue_affinity) + cpumask = q->mq_ops->complete_queue_affinity(hctx, cpu); + + if (!cpumask) { + cpumask = hctx->cpumask; + completion_cpus = cpu_all_mask; + } else { + completion_cpus = cpumask; + } + + if (cpumask_first_and(cpumask, cpu_online_mask) >= nr_cpu_ids) + blk_mq_drain_inflight_rqs(hctx->tags, completion_cpus); - blk_mq_run_hw_queue(hctx, true); return 0; } @@ -3541,7 +3620,8 @@ EXPORT_SYMBOL(blk_mq_rq_cpu); static int __init blk_mq_init(void) { - cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL, + cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", + blk_mq_hctx_notify_prepare, blk_mq_hctx_notify_dead); return 0; }