Message ID | 20191118123012.99664-1-hare@suse.de (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | lpfc: fixup out-of-bounds access during CPU hotplug | expand |
Hi Hannes, On Mon, Nov 18, 2019 at 01:30:12PM +0100, Hannes Reinecke wrote: > The lpfc driver allocates a cpu_map based on the number of possible > cpus during startup. If a CPU hotplug occurs the number of CPUs > might change, causing an out-of-bounds access when trying to lookup > the hardware index for a given CPU. > > Suggested-by: Daniel Wagner <daniel.wagner@suse.com> > Signed-off-by: Hannes Reinecke <hare@suse.de> There are a few more places I think the check is needed: lpfc_nvme_fcp_io_submit(), lpfc_nvmet_ctxbuf_post(), lpfc_nvmet_xmt_fcp_op(), lpfc_nvmet_rcv_unsol_abort(), lpfc_nvme_io_cmd_wqe_cmpl(), lpfc_nvmet_unsol_fcp_buffer(), lpfc_scsi_cmd_iocb_cmpl() At least all of them seem to use the CPU id to access some array. I suggest we review all those places as well. Thanks, Daniel
Hi Hannes, On Mon, Nov 18, 2019 at 01:30:12PM +0100, Hannes Reinecke wrote: > The lpfc driver allocates a cpu_map based on the number of possible > cpus during startup. If a CPU hotplug occurs the number of CPUs > might change, causing an out-of-bounds access when trying to lookup > the hardware index for a given CPU. > > Suggested-by: Daniel Wagner <daniel.wagner@suse.com> > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > drivers/scsi/lpfc/lpfc_scsi.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c > index ba26df90a36a..2380452a8efd 100644 > --- a/drivers/scsi/lpfc/lpfc_scsi.c > +++ b/drivers/scsi/lpfc/lpfc_scsi.c > @@ -642,7 +642,8 @@ lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp, > int tag; > struct fcp_cmd_rsp_buf *tmp = NULL; > > - cpu = raw_smp_processor_id(); > + cpu = min_t(u32, raw_smp_processor_id(), > + phba->sli4_hba.num_possible_cpu); The index is limited by phba->cfg_hdw_queue and not the number of CPUs. Thanks, Daniel
On 11/18/19 3:22 PM, Daniel Wagner wrote: > Hi Hannes, > > On Mon, Nov 18, 2019 at 01:30:12PM +0100, Hannes Reinecke wrote: >> The lpfc driver allocates a cpu_map based on the number of possible >> cpus during startup. If a CPU hotplug occurs the number of CPUs >> might change, causing an out-of-bounds access when trying to lookup >> the hardware index for a given CPU. >> >> Suggested-by: Daniel Wagner <daniel.wagner@suse.com> >> Signed-off-by: Hannes Reinecke <hare@suse.de> >> --- >> drivers/scsi/lpfc/lpfc_scsi.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c >> index ba26df90a36a..2380452a8efd 100644 >> --- a/drivers/scsi/lpfc/lpfc_scsi.c >> +++ b/drivers/scsi/lpfc/lpfc_scsi.c >> @@ -642,7 +642,8 @@ lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp, >> int tag; >> struct fcp_cmd_rsp_buf *tmp = NULL; >> >> - cpu = raw_smp_processor_id(); >> + cpu = min_t(u32, raw_smp_processor_id(), >> + phba->sli4_hba.num_possible_cpu); > > The index is limited by phba->cfg_hdw_queue and not the number of CPUs. > Nope. phba->sli4_hba.cpu_map = kcalloc(phba->sli4_hba.num_possible_cpu, sizeof(struct lpfc_vector_map_info), GFP_KERNEL); Cheers, Hannes
> phba->sli4_hba.cpu_map = kcalloc(phba->sli4_hba.num_possible_cpu, > sizeof(struct lpfc_vector_map_info), > GFP_KERNEL); > Indeed, I didn't realize this. Thanks, Daniel
On 11/18/2019 4:30 AM, Hannes Reinecke wrote: > The lpfc driver allocates a cpu_map based on the number of possible > cpus during startup. If a CPU hotplug occurs the number of CPUs > might change, causing an out-of-bounds access when trying to lookup > the hardware index for a given CPU. > > Suggested-by: Daniel Wagner <daniel.wagner@suse.com> > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > drivers/scsi/lpfc/lpfc_scsi.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c > index ba26df90a36a..2380452a8efd 100644 > --- a/drivers/scsi/lpfc/lpfc_scsi.c > +++ b/drivers/scsi/lpfc/lpfc_scsi.c > @@ -642,7 +642,8 @@ lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp, > int tag; > struct fcp_cmd_rsp_buf *tmp = NULL; > > - cpu = raw_smp_processor_id(); > + cpu = min_t(u32, raw_smp_processor_id(), > + phba->sli4_hba.num_possible_cpu); > if (cmnd && phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) { > tag = blk_mq_unique_tag(cmnd->request); > idx = blk_mq_unique_tag_to_hwq(tag); This should be unnecessary with the lpfc 12.6.0.1 and 12.6.0.2 patches that tie into cpu onling/offlining and cpu hot add. I am curious, how if a cpu is hot removed - how num_possible dynamically changes (to a lower value ?) such that a thread can be running on a cpu that returns a higher cpu number than num_possible ? -- james
On 11/18/19 8:28 PM, James Smart wrote: > On 11/18/2019 4:30 AM, Hannes Reinecke wrote: >> The lpfc driver allocates a cpu_map based on the number of possible >> cpus during startup. If a CPU hotplug occurs the number of CPUs >> might change, causing an out-of-bounds access when trying to lookup >> the hardware index for a given CPU. >> >> Suggested-by: Daniel Wagner <daniel.wagner@suse.com> >> Signed-off-by: Hannes Reinecke <hare@suse.de> >> --- >> drivers/scsi/lpfc/lpfc_scsi.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/lpfc/lpfc_scsi.c >> b/drivers/scsi/lpfc/lpfc_scsi.c >> index ba26df90a36a..2380452a8efd 100644 >> --- a/drivers/scsi/lpfc/lpfc_scsi.c >> +++ b/drivers/scsi/lpfc/lpfc_scsi.c >> @@ -642,7 +642,8 @@ lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, struct >> lpfc_nodelist *ndlp, >> int tag; >> struct fcp_cmd_rsp_buf *tmp = NULL; >> - cpu = raw_smp_processor_id(); >> + cpu = min_t(u32, raw_smp_processor_id(), >> + phba->sli4_hba.num_possible_cpu); >> if (cmnd && phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) { >> tag = blk_mq_unique_tag(cmnd->request); >> idx = blk_mq_unique_tag_to_hwq(tag); > > This should be unnecessary with the lpfc 12.6.0.1 and 12.6.0.2 patches > that tie into cpu onling/offlining and cpu hot add. > > I am curious, how if a cpu is hot removed - how num_possible dynamically > changes (to a lower value ?) such that a thread can be running on a cpu > that returns a higher cpu number than num_possible ? > Actually, the behaviour of num_possible_cpus() under cpu hotplug seems to be implementation-specific. One might want to set it to the max value at all times, which has the drawback that you'd need to scale per-cpu values with that number, leading to a massive memory overhead as only very few installation will ever reach that number. Others might want to set to the max value at the current configuration, implying that it might change under CPU hotplug. Which seems to be the case for PowerPC, which was the case which triggered the issue at hand. But maybe it was a plain bug in the CPU hotplug implementation; one should be asking BenH for details here. Cheers, Hannes
On 11/18/2019 12:21 PM, Hannes Reinecke wrote: > On 11/18/19 8:28 PM, James Smart wrote: >> On 11/18/2019 4:30 AM, Hannes Reinecke wrote: >>> The lpfc driver allocates a cpu_map based on the number of possible >>> cpus during startup. If a CPU hotplug occurs the number of CPUs >>> might change, causing an out-of-bounds access when trying to lookup >>> the hardware index for a given CPU. >>> >>> Suggested-by: Daniel Wagner <daniel.wagner@suse.com> >>> Signed-off-by: Hannes Reinecke <hare@suse.de> >>> --- >>> drivers/scsi/lpfc/lpfc_scsi.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/scsi/lpfc/lpfc_scsi.c >>> b/drivers/scsi/lpfc/lpfc_scsi.c >>> index ba26df90a36a..2380452a8efd 100644 >>> --- a/drivers/scsi/lpfc/lpfc_scsi.c >>> +++ b/drivers/scsi/lpfc/lpfc_scsi.c >>> @@ -642,7 +642,8 @@ lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, >>> struct lpfc_nodelist *ndlp, >>> int tag; >>> struct fcp_cmd_rsp_buf *tmp = NULL; >>> - cpu = raw_smp_processor_id(); >>> + cpu = min_t(u32, raw_smp_processor_id(), >>> + phba->sli4_hba.num_possible_cpu); >>> if (cmnd && phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) { >>> tag = blk_mq_unique_tag(cmnd->request); >>> idx = blk_mq_unique_tag_to_hwq(tag); >> >> This should be unnecessary with the lpfc 12.6.0.1 and 12.6.0.2 >> patches that tie into cpu onling/offlining and cpu hot add. >> >> I am curious, how if a cpu is hot removed - how num_possible >> dynamically changes (to a lower value ?) such that a thread can be >> running on a cpu that returns a higher cpu number than num_possible ? >> > Actually, the behaviour of num_possible_cpus() under cpu hotplug seems > to be implementation-specific. > One might want to set it to the max value at all times, which has the > drawback that you'd need to scale per-cpu values with that number, > leading to a massive memory overhead as only very few installation > will ever reach that number. > Others might want to set to the max value at the current > configuration, implying that it might change under CPU hotplug. > Which seems to be the case for PowerPC, which was the case which > triggered the issue at hand. > But maybe it was a plain bug in the CPU hotplug implementation; one > should be asking BenH for details here. > > Cheers, > > Hannes That sure seems to be at odds with this comment from include/linux/cpumask.h: * The cpu_possible_mask is fixed at boot time, as the set of CPU id's * that it is possible might ever be plugged in at anytime during the * life of that system boot. The cpu_present_mask is dynamic(*), * representing which CPUs are currently plugged in. And * cpu_online_mask is the dynamic subset of cpu_present_mask, * indicating those CPUs available for scheduling. and #define num_possible_cpus() cpumask_weight(cpu_possible_mask) -- james
On 11/19/2019 9:17 AM, James Smart wrote: > > > On 11/18/2019 12:21 PM, Hannes Reinecke wrote: >> On 11/18/19 8:28 PM, James Smart wrote: >>> On 11/18/2019 4:30 AM, Hannes Reinecke wrote: >>>> The lpfc driver allocates a cpu_map based on the number of possible >>>> cpus during startup. If a CPU hotplug occurs the number of CPUs >>>> might change, causing an out-of-bounds access when trying to lookup >>>> the hardware index for a given CPU. >>>> >>>> Suggested-by: Daniel Wagner <daniel.wagner@suse.com> >>>> Signed-off-by: Hannes Reinecke <hare@suse.de> >>>> --- >>>> drivers/scsi/lpfc/lpfc_scsi.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/scsi/lpfc/lpfc_scsi.c >>>> b/drivers/scsi/lpfc/lpfc_scsi.c >>>> index ba26df90a36a..2380452a8efd 100644 >>>> --- a/drivers/scsi/lpfc/lpfc_scsi.c >>>> +++ b/drivers/scsi/lpfc/lpfc_scsi.c >>>> @@ -642,7 +642,8 @@ lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, >>>> struct lpfc_nodelist *ndlp, >>>> int tag; >>>> struct fcp_cmd_rsp_buf *tmp = NULL; >>>> - cpu = raw_smp_processor_id(); >>>> + cpu = min_t(u32, raw_smp_processor_id(), >>>> + phba->sli4_hba.num_possible_cpu); >>>> if (cmnd && phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) { >>>> tag = blk_mq_unique_tag(cmnd->request); >>>> idx = blk_mq_unique_tag_to_hwq(tag); >>> >>> This should be unnecessary with the lpfc 12.6.0.1 and 12.6.0.2 >>> patches that tie into cpu onling/offlining and cpu hot add. >>> >>> I am curious, how if a cpu is hot removed - how num_possible >>> dynamically changes (to a lower value ?) such that a thread can be >>> running on a cpu that returns a higher cpu number than num_possible ? >>> >> Actually, the behaviour of num_possible_cpus() under cpu hotplug >> seems to be implementation-specific. >> One might want to set it to the max value at all times, which has the >> drawback that you'd need to scale per-cpu values with that number, >> leading to a massive memory overhead as only very few installation >> will ever reach that number. >> Others might want to set to the max value at the current >> configuration, implying that it might change under CPU hotplug. >> Which seems to be the case for PowerPC, which was the case which >> triggered the issue at hand. >> But maybe it was a plain bug in the CPU hotplug implementation; one >> should be asking BenH for details here. >> >> Cheers, >> >> Hannes > > That sure seems to be at odds with this comment from > include/linux/cpumask.h: > > * The cpu_possible_mask is fixed at boot time, as the set of CPU id's > * that it is possible might ever be plugged in at anytime during the > * life of that system boot. The cpu_present_mask is dynamic(*), > * representing which CPUs are currently plugged in. And > * cpu_online_mask is the dynamic subset of cpu_present_mask, > * indicating those CPUs available for scheduling. > > and > #define num_possible_cpus() cpumask_weight(cpu_possible_mask) > > > -- james > Although... num_possible_cpus() shouldn't change, but I guess a cpu id can be greater than the possible cpu count. But there is this as well - also from include/linux/cpumask.h: * If HOTPLUG is enabled, then cpu_possible_mask is forced to have * all NR_CPUS bits set, otherwise it is just the set of CPUs that * ACPI reports present at boot. -- james
On Tue, 2019-11-19 at 09:17 -0800, James Smart wrote: > > > Actually, the behaviour of num_possible_cpus() under cpu hotplug seems > > to be implementation-specific. > > One might want to set it to the max value at all times, which has the > > drawback that you'd need to scale per-cpu values with that number, > > leading to a massive memory overhead as only very few installation > > will ever reach that number. > > Others might want to set to the max value at the current > > configuration, implying that it might change under CPU hotplug. > > Which seems to be the case for PowerPC, which was the case which > > triggered the issue at hand. > > But maybe it was a plain bug in the CPU hotplug implementation; one > > should be asking BenH for details here. > > > > Cheers, > > > > Hannes > > That sure seems to be at odds with this comment from > include/linux/cpumask.h: > > * The cpu_possible_mask is fixed at boot time, as the set of CPU id's > * that it is possible might ever be plugged in at anytime during the > * life of that system boot. The cpu_present_mask is dynamic(*), > * representing which CPUs are currently plugged in. And > * cpu_online_mask is the dynamic subset of cpu_present_mask, > * indicating those CPUs available for scheduling. > > and > #define num_possible_cpus() cpumask_weight(cpu_possible_mask) Right, it should be fixed at boot. Where do you see powerpc change it ? Cheers, Ben.
On Tue, 2019-11-19 at 09:34 -0800, James Smart wrote: > Although... num_possible_cpus() shouldn't change, but I guess a cpu > id > can be greater than the possible cpu count. > > But there is this as well - also from include/linux/cpumask.h: > * If HOTPLUG is enabled, then cpu_possible_mask is forced to have > * all NR_CPUS bits set, otherwise it is just the set of CPUs that > * ACPI reports present at boot. That's probably an x86'ism... on powerpc the mask is set based on what's really possible and it may not be all bits Cheers, Ben.
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c index ba26df90a36a..2380452a8efd 100644 --- a/drivers/scsi/lpfc/lpfc_scsi.c +++ b/drivers/scsi/lpfc/lpfc_scsi.c @@ -642,7 +642,8 @@ lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp, int tag; struct fcp_cmd_rsp_buf *tmp = NULL; - cpu = raw_smp_processor_id(); + cpu = min_t(u32, raw_smp_processor_id(), + phba->sli4_hba.num_possible_cpu); if (cmnd && phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) { tag = blk_mq_unique_tag(cmnd->request); idx = blk_mq_unique_tag_to_hwq(tag);
The lpfc driver allocates a cpu_map based on the number of possible cpus during startup. If a CPU hotplug occurs the number of CPUs might change, causing an out-of-bounds access when trying to lookup the hardware index for a given CPU. Suggested-by: Daniel Wagner <daniel.wagner@suse.com> Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/scsi/lpfc/lpfc_scsi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)