Message ID | 20190816023649.16682-1-jsmart2021@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [v3] lpfc: Mitigate high memory pre-allocation by SCSI-MQ | expand |
On Thu, 2019-08-15 at 19:36 -0700, James Smart wrote: > When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of > MQ resources based on shost values set by the driver. In newer cases > of the driver, which attempts to set nr_hw_queues to the cpu count, > the multipliers become excessive, with a single shost having SCSI-MQ > pre-allocation reaching into the multiple GBytes range. NPIV, which > creates additional shosts, only multiply this overhead. On lower-memory > systems, this can exhaust system memory very quickly, resulting in a > system crash or failures in the driver or elsewhere due to low memory > conditions. > > After testing several scenarios, the situation can be mitigated by > limiting the value set in shost->nr_hw_queues to 4. Although the shost > values were changed, the driver still had per-cpu hardware queues of > its own that allowed parallelization per-cpu. Testing revealed that > even with the smallish number for nr_hw_queues for SCSI-MQ, performance > levels remained near maximum with the within-driver affiinitization. > > A module parameter was created to allow the value set for the > nr_hw_queues to be tunable. > > Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> > Signed-off-by: James Smart <jsmart2021@gmail.com> > Reviewed-by: Ming Lei <ming.lei@redhat.com> > > --- > v3: add Ming's reviewed-by tag > --- > drivers/scsi/lpfc/lpfc.h | 1 + > drivers/scsi/lpfc/lpfc_attr.c | 15 +++++++++++++++ > drivers/scsi/lpfc/lpfc_init.c | 10 ++++++---- > drivers/scsi/lpfc/lpfc_sli4.h | 5 +++++ > 4 files changed, 27 insertions(+), 4 deletions(-) > > diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h > index 2c3bb8a966e5..bade2e025ecf 100644 > --- a/drivers/scsi/lpfc/lpfc.h > +++ b/drivers/scsi/lpfc/lpfc.h > @@ -824,6 +824,7 @@ struct lpfc_hba { > uint32_t cfg_cq_poll_threshold; > uint32_t cfg_cq_max_proc_limit; > uint32_t cfg_fcp_cpu_map; > + uint32_t cfg_fcp_mq_threshold; > uint32_t cfg_hdw_queue; > uint32_t cfg_irq_chann; > uint32_t cfg_suppress_rsp; > diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c > index ea62322ffe2b..8d8c495b5b60 100644 > --- a/drivers/scsi/lpfc/lpfc_attr.c > +++ b/drivers/scsi/lpfc/lpfc_attr.c > @@ -5709,6 +5709,19 @@ LPFC_ATTR_RW(nvme_embed_cmd, 1, 0, 2, > "Embed NVME Command in WQE"); > > /* > + * lpfc_fcp_mq_threshold: Set the maximum number of Hardware Queues > + * the driver will advertise it supports to the SCSI layer. > + * > + * 0 = Set nr_hw_queues by the number of CPUs or HW queues. > + * 1,128 = Manually specify the maximum nr_hw_queue value to be set, > + * > + * Value range is [0,128]. Default value is 8. > + */ > +LPFC_ATTR_R(fcp_mq_threshold, LPFC_FCP_MQ_THRESHOLD_DEF, > + LPFC_FCP_MQ_THRESHOLD_MIN, LPFC_FCP_MQ_THRESHOLD_MAX, > + "Set the number of SCSI Queues advertised"); > + > +/* > * lpfc_hdw_queue: Set the number of Hardware Queues the driver > * will advertise it supports to the NVME and SCSI layers. This also > * will map to the number of CQ/WQ pairs the driver will create. > @@ -6030,6 +6043,7 @@ struct device_attribute *lpfc_hba_attrs[] = { > &dev_attr_lpfc_cq_poll_threshold, > &dev_attr_lpfc_cq_max_proc_limit, > &dev_attr_lpfc_fcp_cpu_map, > + &dev_attr_lpfc_fcp_mq_threshold, > &dev_attr_lpfc_hdw_queue, > &dev_attr_lpfc_irq_chann, > &dev_attr_lpfc_suppress_rsp, > @@ -7112,6 +7126,7 @@ lpfc_get_cfgparam(struct lpfc_hba *phba) > /* Initialize first burst. Target vs Initiator are different. */ > lpfc_nvme_enable_fb_init(phba, lpfc_nvme_enable_fb); > lpfc_nvmet_fb_size_init(phba, lpfc_nvmet_fb_size); > + lpfc_fcp_mq_threshold_init(phba, lpfc_fcp_mq_threshold); > lpfc_hdw_queue_init(phba, lpfc_hdw_queue); > lpfc_irq_chann_init(phba, lpfc_irq_chann); > lpfc_enable_bbcr_init(phba, lpfc_enable_bbcr); > diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c > index faf43b1d3dbe..03998579d6ee 100644 > --- a/drivers/scsi/lpfc/lpfc_init.c > +++ b/drivers/scsi/lpfc/lpfc_init.c > @@ -4309,10 +4309,12 @@ lpfc_create_port(struct lpfc_hba *phba, int instance, struct device *dev) > shost->max_cmd_len = 16; > > if (phba->sli_rev == LPFC_SLI_REV4) { > - if (phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) > - shost->nr_hw_queues = phba->cfg_hdw_queue; > - else > - shost->nr_hw_queues = phba->sli4_hba.num_present_cpu; > + if (!phba->cfg_fcp_mq_threshold || > + phba->cfg_fcp_mq_threshold > phba->cfg_hdw_queue) > + phba->cfg_fcp_mq_threshold = phba->cfg_hdw_queue; > + > + shost->nr_hw_queues = min_t(int, 2 * num_possible_nodes(), > + phba->cfg_fcp_mq_threshold); > > shost->dma_boundary = > phba->sli4_hba.pc_sli4_params.sge_supp_len-1; > diff --git a/drivers/scsi/lpfc/lpfc_sli4.h b/drivers/scsi/lpfc/lpfc_sli4.h > index 3aeca387b22a..329f7aa7e169 100644 > --- a/drivers/scsi/lpfc/lpfc_sli4.h > +++ b/drivers/scsi/lpfc/lpfc_sli4.h > @@ -44,6 +44,11 @@ > #define LPFC_HBA_HDWQ_MAX 128 > #define LPFC_HBA_HDWQ_DEF 0 > > +/* FCP MQ queue count limiting */ > +#define LPFC_FCP_MQ_THRESHOLD_MIN 0 > +#define LPFC_FCP_MQ_THRESHOLD_MAX 128 > +#define LPFC_FCP_MQ_THRESHOLD_DEF 8 > + > /* Common buffer size to accomidate SCSI and NVME IO buffers */ > #define LPFC_COMMON_IO_BUF_SZ 768 > Looks good. Reviewed-by: Ewan D. Milne <emilne@redhat.com>
James, > When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of > MQ resources based on shost values set by the driver. In newer cases > of the driver, which attempts to set nr_hw_queues to the cpu count, > the multipliers become excessive, with a single shost having SCSI-MQ > pre-allocation reaching into the multiple GBytes range. NPIV, which > creates additional shosts, only multiply this overhead. On lower-memory > systems, this can exhaust system memory very quickly, resulting in a > system crash or failures in the driver or elsewhere due to low memory > conditions. Applied to 5.3/scsi-fixes. Thanks!
On 8/16/19 4:36 AM, James Smart wrote: > When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of > MQ resources based on shost values set by the driver. In newer cases > of the driver, which attempts to set nr_hw_queues to the cpu count, > the multipliers become excessive, with a single shost having SCSI-MQ > pre-allocation reaching into the multiple GBytes range. NPIV, which > creates additional shosts, only multiply this overhead. On lower-memory > systems, this can exhaust system memory very quickly, resulting in a > system crash or failures in the driver or elsewhere due to low memory > conditions. > > After testing several scenarios, the situation can be mitigated by > limiting the value set in shost->nr_hw_queues to 4. Although the shost > values were changed, the driver still had per-cpu hardware queues of > its own that allowed parallelization per-cpu. Testing revealed that > even with the smallish number for nr_hw_queues for SCSI-MQ, performance > levels remained near maximum with the within-driver affiinitization. > > A module parameter was created to allow the value set for the > nr_hw_queues to be tunable. > > Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> > Signed-off-by: James Smart <jsmart2021@gmail.com> > Reviewed-by: Ming Lei <ming.lei@redhat.com> > > --- > v3: add Ming's reviewed-by tag > --- > drivers/scsi/lpfc/lpfc.h | 1 + > drivers/scsi/lpfc/lpfc_attr.c | 15 +++++++++++++++ > drivers/scsi/lpfc/lpfc_init.c | 10 ++++++---- > drivers/scsi/lpfc/lpfc_sli4.h | 5 +++++ > 4 files changed, 27 insertions(+), 4 deletions(-) > Well, that doesn't actually match with my measurements (where I've seen max I/O performance at about 16 queues); so I guess this is pretty much setup-specific. However, I'm somewhat loath to have a cap at 128; we actually have several machines where we'll be having more CPUs than that. Can't we increase the cap to 512 to give us a bit more leeway during testing? Cheers, Hannes
On 8/26/2019 12:18 AM, Hannes Reinecke wrote: > On 8/16/19 4:36 AM, James Smart wrote: >> When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of >> MQ resources based on shost values set by the driver. In newer cases >> of the driver, which attempts to set nr_hw_queues to the cpu count, >> the multipliers become excessive, with a single shost having SCSI-MQ >> pre-allocation reaching into the multiple GBytes range. NPIV, which >> creates additional shosts, only multiply this overhead. On lower-memory >> systems, this can exhaust system memory very quickly, resulting in a >> system crash or failures in the driver or elsewhere due to low memory >> conditions. >> >> After testing several scenarios, the situation can be mitigated by >> limiting the value set in shost->nr_hw_queues to 4. Although the shost >> values were changed, the driver still had per-cpu hardware queues of >> its own that allowed parallelization per-cpu. Testing revealed that >> even with the smallish number for nr_hw_queues for SCSI-MQ, performance >> levels remained near maximum with the within-driver affiinitization. >> >> A module parameter was created to allow the value set for the >> nr_hw_queues to be tunable. >> >> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> >> Signed-off-by: James Smart <jsmart2021@gmail.com> >> Reviewed-by: Ming Lei <ming.lei@redhat.com> >> >> --- >> v3: add Ming's reviewed-by tag >> --- >> drivers/scsi/lpfc/lpfc.h | 1 + >> drivers/scsi/lpfc/lpfc_attr.c | 15 +++++++++++++++ >> drivers/scsi/lpfc/lpfc_init.c | 10 ++++++---- >> drivers/scsi/lpfc/lpfc_sli4.h | 5 +++++ >> 4 files changed, 27 insertions(+), 4 deletions(-) >> > Well, that doesn't actually match with my measurements (where I've seen > max I/O performance at about 16 queues); so I guess this is pretty much > setup-specific. Keep in mind, when we ran our benchmarks, the driver was still using per-cpu hdwq's selected by cpu #. > However, I'm somewhat loath to have a cap at 128; we actually have > several machines where we'll be having more CPUs than that. > Can't we increase the cap to 512 to give us a bit more leeway during > testing? I'm fine if you want me to raise the max for the attribute. Keep in mind, if 0, it can go > 128 to whatever the cpu number is, assuming it's > 128. -- james
diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 2c3bb8a966e5..bade2e025ecf 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -824,6 +824,7 @@ struct lpfc_hba { uint32_t cfg_cq_poll_threshold; uint32_t cfg_cq_max_proc_limit; uint32_t cfg_fcp_cpu_map; + uint32_t cfg_fcp_mq_threshold; uint32_t cfg_hdw_queue; uint32_t cfg_irq_chann; uint32_t cfg_suppress_rsp; diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c index ea62322ffe2b..8d8c495b5b60 100644 --- a/drivers/scsi/lpfc/lpfc_attr.c +++ b/drivers/scsi/lpfc/lpfc_attr.c @@ -5709,6 +5709,19 @@ LPFC_ATTR_RW(nvme_embed_cmd, 1, 0, 2, "Embed NVME Command in WQE"); /* + * lpfc_fcp_mq_threshold: Set the maximum number of Hardware Queues + * the driver will advertise it supports to the SCSI layer. + * + * 0 = Set nr_hw_queues by the number of CPUs or HW queues. + * 1,128 = Manually specify the maximum nr_hw_queue value to be set, + * + * Value range is [0,128]. Default value is 8. + */ +LPFC_ATTR_R(fcp_mq_threshold, LPFC_FCP_MQ_THRESHOLD_DEF, + LPFC_FCP_MQ_THRESHOLD_MIN, LPFC_FCP_MQ_THRESHOLD_MAX, + "Set the number of SCSI Queues advertised"); + +/* * lpfc_hdw_queue: Set the number of Hardware Queues the driver * will advertise it supports to the NVME and SCSI layers. This also * will map to the number of CQ/WQ pairs the driver will create. @@ -6030,6 +6043,7 @@ struct device_attribute *lpfc_hba_attrs[] = { &dev_attr_lpfc_cq_poll_threshold, &dev_attr_lpfc_cq_max_proc_limit, &dev_attr_lpfc_fcp_cpu_map, + &dev_attr_lpfc_fcp_mq_threshold, &dev_attr_lpfc_hdw_queue, &dev_attr_lpfc_irq_chann, &dev_attr_lpfc_suppress_rsp, @@ -7112,6 +7126,7 @@ lpfc_get_cfgparam(struct lpfc_hba *phba) /* Initialize first burst. Target vs Initiator are different. */ lpfc_nvme_enable_fb_init(phba, lpfc_nvme_enable_fb); lpfc_nvmet_fb_size_init(phba, lpfc_nvmet_fb_size); + lpfc_fcp_mq_threshold_init(phba, lpfc_fcp_mq_threshold); lpfc_hdw_queue_init(phba, lpfc_hdw_queue); lpfc_irq_chann_init(phba, lpfc_irq_chann); lpfc_enable_bbcr_init(phba, lpfc_enable_bbcr); diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index faf43b1d3dbe..03998579d6ee 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -4309,10 +4309,12 @@ lpfc_create_port(struct lpfc_hba *phba, int instance, struct device *dev) shost->max_cmd_len = 16; if (phba->sli_rev == LPFC_SLI_REV4) { - if (phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) - shost->nr_hw_queues = phba->cfg_hdw_queue; - else - shost->nr_hw_queues = phba->sli4_hba.num_present_cpu; + if (!phba->cfg_fcp_mq_threshold || + phba->cfg_fcp_mq_threshold > phba->cfg_hdw_queue) + phba->cfg_fcp_mq_threshold = phba->cfg_hdw_queue; + + shost->nr_hw_queues = min_t(int, 2 * num_possible_nodes(), + phba->cfg_fcp_mq_threshold); shost->dma_boundary = phba->sli4_hba.pc_sli4_params.sge_supp_len-1; diff --git a/drivers/scsi/lpfc/lpfc_sli4.h b/drivers/scsi/lpfc/lpfc_sli4.h index 3aeca387b22a..329f7aa7e169 100644 --- a/drivers/scsi/lpfc/lpfc_sli4.h +++ b/drivers/scsi/lpfc/lpfc_sli4.h @@ -44,6 +44,11 @@ #define LPFC_HBA_HDWQ_MAX 128 #define LPFC_HBA_HDWQ_DEF 0 +/* FCP MQ queue count limiting */ +#define LPFC_FCP_MQ_THRESHOLD_MIN 0 +#define LPFC_FCP_MQ_THRESHOLD_MAX 128 +#define LPFC_FCP_MQ_THRESHOLD_DEF 8 + /* Common buffer size to accomidate SCSI and NVME IO buffers */ #define LPFC_COMMON_IO_BUF_SZ 768