From patchwork Tue Apr 23 10:32:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10912703 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5E1FF1575 for ; Tue, 23 Apr 2019 10:32:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4EDFB28806 for ; Tue, 23 Apr 2019 10:32:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 42C8828827; Tue, 23 Apr 2019 10:32:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C93082857E for ; Tue, 23 Apr 2019 10:32:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727033AbfDWKc6 (ORCPT ); Tue, 23 Apr 2019 06:32:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39492 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726421AbfDWKc6 (ORCPT ); Tue, 23 Apr 2019 06:32:58 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D6B3E3087932; Tue, 23 Apr 2019 10:32:57 +0000 (UTC) Received: from localhost (ovpn-8-29.pek2.redhat.com [10.72.8.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 97983646A6; Tue, 23 Apr 2019 10:32:56 +0000 (UTC) From: Ming Lei To: James Bottomley , linux-scsi@vger.kernel.org, "Martin K . Petersen" Cc: linux-block@vger.kernel.org, Ming Lei , Christoph Hellwig , Bart Van Assche , "Ewan D . Milne" , Hannes Reinecke Subject: [PATCH 1/2] scsi: core: avoid to pre-allocate big chunk for protection meta data Date: Tue, 23 Apr 2019 18:32:39 +0800 Message-Id: <20190423103240.29864-2-ming.lei@redhat.com> In-Reply-To: <20190423103240.29864-1-ming.lei@redhat.com> References: <20190423103240.29864-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Tue, 23 Apr 2019 10:32:57 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now scsi_mq_setup_tags() pre-allocates a big buffer for protection sg list, and the buffer size is scsi_mq_sgl_size(). This way isn't correct, scsi_mq_sgl_size() is used to pre-allocate sg list for IO data. And the protection data buffer is much less, for example, one 512byte sector needs 8byte protection data, and the max sector number for one request is 2560(BLK_DEF_MAX_SECTORS), so the max protection data size is just 20k. The usual case is that one bio builds one single bip segment. Attribute to bio split, bio merge is seldom done for big IO, and it is only done in case of small bios. And the bip segment number is usually same with bio count in the request, so the number won't be very big, and allocating from slab should be fast enough. Reduce to pre-alocate one sg entry for protection data, and switches to runtime allocation from slab in case that the protection data segment number is bigger than 1. So we can save huge pre-alocation for protection data, for example, 500+MB can be saved on lpfc. Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Ewan D. Milne Cc: Hannes Reinecke Signed-off-by: Ming Lei --- drivers/scsi/scsi_lib.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 07dfc17d4824..bdcf40851356 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -39,6 +39,12 @@ #include "scsi_priv.h" #include "scsi_logging.h" +/* + * Size of integrity meta data size is usually small, 1 inline sg + * should cover normal cases. + */ +#define SCSI_INLINE_PROT_SG_CNT 1 + static struct kmem_cache *scsi_sdb_cache; static struct kmem_cache *scsi_sense_cache; static struct kmem_cache *scsi_sense_isadma_cache; @@ -553,12 +559,21 @@ static void scsi_uninit_cmd(struct scsi_cmnd *cmd) } } +static inline bool scsi_prot_use_inline_sg(struct scsi_cmnd *cmd) +{ + if (!scsi_prot_sglist(cmd)) + return false; + + return cmd->prot_sdb->table.sgl == + (struct scatterlist *)(cmd->prot_sdb + 1); +} + static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd) { if (cmd->sdb.table.nents) sg_free_table_chained(&cmd->sdb.table, true); - if (scsi_prot_sg_count(cmd)) - sg_free_table_chained(&cmd->prot_sdb->table, true); + if (scsi_prot_sg_count(cmd) && !scsi_prot_use_inline_sg(cmd)) + sg_free_table_chained(&cmd->prot_sdb->table, false); } static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd) @@ -1044,9 +1059,11 @@ blk_status_t scsi_init_io(struct scsi_cmnd *cmd) } ivecs = blk_rq_count_integrity_sg(rq->q, rq->bio); - - if (sg_alloc_table_chained(&prot_sdb->table, ivecs, - prot_sdb->table.sgl)) { + if (ivecs <= SCSI_INLINE_PROT_SG_CNT) + prot_sdb->table.nents = prot_sdb->table.orig_nents = + SCSI_INLINE_PROT_SG_CNT; + else if (sg_alloc_table_chained(&prot_sdb->table, ivecs, + NULL)) { ret = BLK_STS_RESOURCE; goto out_free_sgtables; } @@ -1846,7 +1863,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) sgl_size = scsi_mq_sgl_size(shost); cmd_size = sizeof(struct scsi_cmnd) + shost->hostt->cmd_size + sgl_size; if (scsi_host_get_prot(shost)) - cmd_size += sizeof(struct scsi_data_buffer) + sgl_size; + cmd_size += sizeof(struct scsi_data_buffer) + + sizeof(struct scatterlist) * SCSI_INLINE_PROT_SG_CNT; memset(&shost->tag_set, 0, sizeof(shost->tag_set)); shost->tag_set.ops = &scsi_mq_ops; From patchwork Tue Apr 23 10:32:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10912709 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CE6FD1708 for ; Tue, 23 Apr 2019 10:33:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C093628806 for ; Tue, 23 Apr 2019 10:33:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B45482857E; Tue, 23 Apr 2019 10:33:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25EB82857E for ; Tue, 23 Apr 2019 10:33:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727366AbfDWKdH (ORCPT ); Tue, 23 Apr 2019 06:33:07 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53188 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726421AbfDWKdH (ORCPT ); Tue, 23 Apr 2019 06:33:07 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3279181E0D; Tue, 23 Apr 2019 10:33:07 +0000 (UTC) Received: from localhost (ovpn-8-29.pek2.redhat.com [10.72.8.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 960864D723; Tue, 23 Apr 2019 10:33:00 +0000 (UTC) From: Ming Lei To: James Bottomley , linux-scsi@vger.kernel.org, "Martin K . Petersen" Cc: linux-block@vger.kernel.org, Ming Lei , Christoph Hellwig , Bart Van Assche , "Ewan D . Milne" , Hannes Reinecke Subject: [PATCH 2/2] scsi: core: avoid to pre-allocate big chunk for sg list Date: Tue, 23 Apr 2019 18:32:40 +0800 Message-Id: <20190423103240.29864-3-ming.lei@redhat.com> In-Reply-To: <20190423103240.29864-1-ming.lei@redhat.com> References: <20190423103240.29864-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 23 Apr 2019 10:33:07 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now scsi_mq_setup_tags() pre-allocates a big buffer for IO sg list, and the buffer size is scsi_mq_sgl_size() which depends on smaller value between shost->sg_tablesize and SG_CHUNK_SIZE. Modern HBA's DMA capabilty is often capable of deadling with very big segment number, so scsi_mq_sgl_size() is often big. Suppose the max sg number of SG_CHUNK_SIZE is taken, scsi_mq_sgl_size() will be 4KB. Then if one HBA has lots of queues, and each hw queue's depth is big, the whole pre-allocation for sg list can consume huge memory. For example of lpfc, nr_hw_queues can be 70, each queue's depth can be 3781, so the pre-allocation for data sg list can be 70*3781*2k =517MB for single HBA. Also there is Red Hat internal reprot that scsi_debug based tests can't be run any more since legacy io path is killed because too big pre-allocation. This patch switchs to runtime allocation for sg list, meantime pre-allocate 2 inline sg entries. This way has been applied to NVMe for a while, so it should be fine for SCSI too. Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Ewan D. Milne Cc: Hannes Reinecke Signed-off-by: Ming Lei --- drivers/scsi/scsi_lib.c | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index bdcf40851356..4fff95b14c91 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -45,6 +45,8 @@ */ #define SCSI_INLINE_PROT_SG_CNT 1 +#define SCSI_INLINE_SG_CNT 2 + static struct kmem_cache *scsi_sdb_cache; static struct kmem_cache *scsi_sense_cache; static struct kmem_cache *scsi_sense_isadma_cache; @@ -568,10 +570,18 @@ static inline bool scsi_prot_use_inline_sg(struct scsi_cmnd *cmd) (struct scatterlist *)(cmd->prot_sdb + 1); } +static bool scsi_use_inline_sg(struct scsi_cmnd *cmd) +{ + struct scatterlist *sg = (void *)cmd + sizeof(struct scsi_cmnd) + + cmd->device->host->hostt->cmd_size; + + return cmd->sdb.table.sgl == sg; +} + static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd) { - if (cmd->sdb.table.nents) - sg_free_table_chained(&cmd->sdb.table, true); + if (cmd->sdb.table.nents && !scsi_use_inline_sg(cmd)) + sg_free_table_chained(&cmd->sdb.table, false); if (scsi_prot_sg_count(cmd) && !scsi_prot_use_inline_sg(cmd)) sg_free_table_chained(&cmd->prot_sdb->table, false); } @@ -1002,12 +1012,16 @@ static blk_status_t scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb) { int count; + unsigned nr_segs = blk_rq_nr_phys_segments(req); /* * If sg table allocation fails, requeue request later. */ - if (unlikely(sg_alloc_table_chained(&sdb->table, - blk_rq_nr_phys_segments(req), sdb->table.sgl))) + if (nr_segs <= SCSI_INLINE_SG_CNT) + sdb->table.nents = sdb->table.orig_nents = + SCSI_INLINE_SG_CNT; + else if (unlikely(sg_alloc_table_chained(&sdb->table, nr_segs, + NULL))) return BLK_STS_RESOURCE; /* @@ -1574,9 +1588,9 @@ static int scsi_dispatch_cmd(struct scsi_cmnd *cmd) } /* Size in bytes of the sg-list stored in the scsi-mq command-private data. */ -static unsigned int scsi_mq_sgl_size(struct Scsi_Host *shost) +static unsigned int scsi_mq_inline_sgl_size(struct Scsi_Host *shost) { - return min_t(unsigned int, shost->sg_tablesize, SG_CHUNK_SIZE) * + return min_t(unsigned int, shost->sg_tablesize, SCSI_INLINE_SG_CNT) * sizeof(struct scatterlist); } @@ -1766,7 +1780,7 @@ static int scsi_mq_init_request(struct blk_mq_tag_set *set, struct request *rq, if (scsi_host_get_prot(shost)) { sg = (void *)cmd + sizeof(struct scsi_cmnd) + shost->hostt->cmd_size; - cmd->prot_sdb = (void *)sg + scsi_mq_sgl_size(shost); + cmd->prot_sdb = (void *)sg + scsi_mq_inline_sgl_size(shost); } return 0; @@ -1860,7 +1874,7 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) { unsigned int cmd_size, sgl_size; - sgl_size = scsi_mq_sgl_size(shost); + sgl_size = scsi_mq_inline_sgl_size(shost); cmd_size = sizeof(struct scsi_cmnd) + shost->hostt->cmd_size + sgl_size; if (scsi_host_get_prot(shost)) cmd_size += sizeof(struct scsi_data_buffer) +