diff mbox series

lpfc: fixup out-of-bounds access during CPU hotplug

Message ID 20191118123012.99664-1-hare@suse.de (mailing list archive)
State Changes Requested
Headers show
Series lpfc: fixup out-of-bounds access during CPU hotplug | expand

Commit Message

Hannes Reinecke Nov. 18, 2019, 12:30 p.m. UTC
The lpfc driver allocates a cpu_map based on the number of possible
cpus during startup. If a CPU hotplug occurs the number of CPUs
might change, causing an out-of-bounds access when trying to lookup
the hardware index for a given CPU.

Suggested-by: Daniel Wagner <daniel.wagner@suse.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/lpfc/lpfc_scsi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Daniel Wagner Nov. 18, 2019, 1:57 p.m. UTC | #1
Hi Hannes,

On Mon, Nov 18, 2019 at 01:30:12PM +0100, Hannes Reinecke wrote:
> The lpfc driver allocates a cpu_map based on the number of possible
> cpus during startup. If a CPU hotplug occurs the number of CPUs
> might change, causing an out-of-bounds access when trying to lookup
> the hardware index for a given CPU.
> 
> Suggested-by: Daniel Wagner <daniel.wagner@suse.com>
> Signed-off-by: Hannes Reinecke <hare@suse.de>

There are a few more places I think the check is needed:

lpfc_nvme_fcp_io_submit(), lpfc_nvmet_ctxbuf_post(),
lpfc_nvmet_xmt_fcp_op(), lpfc_nvmet_rcv_unsol_abort(),
lpfc_nvme_io_cmd_wqe_cmpl(), lpfc_nvmet_unsol_fcp_buffer(),
lpfc_scsi_cmd_iocb_cmpl()

At least all of them seem to use the CPU id to access some array. I
suggest we review all those places as well.

Thanks,
Daniel
Daniel Wagner Nov. 18, 2019, 2:22 p.m. UTC | #2
Hi Hannes,

On Mon, Nov 18, 2019 at 01:30:12PM +0100, Hannes Reinecke wrote:
> The lpfc driver allocates a cpu_map based on the number of possible
> cpus during startup. If a CPU hotplug occurs the number of CPUs
> might change, causing an out-of-bounds access when trying to lookup
> the hardware index for a given CPU.
> 
> Suggested-by: Daniel Wagner <daniel.wagner@suse.com>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/lpfc/lpfc_scsi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
> index ba26df90a36a..2380452a8efd 100644
> --- a/drivers/scsi/lpfc/lpfc_scsi.c
> +++ b/drivers/scsi/lpfc/lpfc_scsi.c
> @@ -642,7 +642,8 @@ lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp,
>  	int tag;
>  	struct fcp_cmd_rsp_buf *tmp = NULL;
>  
> -	cpu = raw_smp_processor_id();
> +	cpu = min_t(u32, raw_smp_processor_id(),
> +		    phba->sli4_hba.num_possible_cpu);

The index is limited by phba->cfg_hdw_queue and not the number of CPUs.

Thanks,
Daniel
Hannes Reinecke Nov. 18, 2019, 3:11 p.m. UTC | #3
On 11/18/19 3:22 PM, Daniel Wagner wrote:
> Hi Hannes,
> 
> On Mon, Nov 18, 2019 at 01:30:12PM +0100, Hannes Reinecke wrote:
>> The lpfc driver allocates a cpu_map based on the number of possible
>> cpus during startup. If a CPU hotplug occurs the number of CPUs
>> might change, causing an out-of-bounds access when trying to lookup
>> the hardware index for a given CPU.
>>
>> Suggested-by: Daniel Wagner <daniel.wagner@suse.com>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>  drivers/scsi/lpfc/lpfc_scsi.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
>> index ba26df90a36a..2380452a8efd 100644
>> --- a/drivers/scsi/lpfc/lpfc_scsi.c
>> +++ b/drivers/scsi/lpfc/lpfc_scsi.c
>> @@ -642,7 +642,8 @@ lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp,
>>  	int tag;
>>  	struct fcp_cmd_rsp_buf *tmp = NULL;
>>  
>> -	cpu = raw_smp_processor_id();
>> +	cpu = min_t(u32, raw_smp_processor_id(),
>> +		    phba->sli4_hba.num_possible_cpu);
> 
> The index is limited by phba->cfg_hdw_queue and not the number of CPUs.
> 
Nope.

phba->sli4_hba.cpu_map = kcalloc(phba->sli4_hba.num_possible_cpu,
				sizeof(struct lpfc_vector_map_info),
				GFP_KERNEL);


Cheers,

Hannes
Daniel Wagner Nov. 18, 2019, 3:36 p.m. UTC | #4
> phba->sli4_hba.cpu_map = kcalloc(phba->sli4_hba.num_possible_cpu,
> 				sizeof(struct lpfc_vector_map_info),
> 				GFP_KERNEL);
> 

Indeed, I didn't realize this.

Thanks,
Daniel
James Smart Nov. 18, 2019, 7:28 p.m. UTC | #5
On 11/18/2019 4:30 AM, Hannes Reinecke wrote:
> The lpfc driver allocates a cpu_map based on the number of possible
> cpus during startup. If a CPU hotplug occurs the number of CPUs
> might change, causing an out-of-bounds access when trying to lookup
> the hardware index for a given CPU.
>
> Suggested-by: Daniel Wagner <daniel.wagner@suse.com>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>   drivers/scsi/lpfc/lpfc_scsi.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
> index ba26df90a36a..2380452a8efd 100644
> --- a/drivers/scsi/lpfc/lpfc_scsi.c
> +++ b/drivers/scsi/lpfc/lpfc_scsi.c
> @@ -642,7 +642,8 @@ lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp,
>   	int tag;
>   	struct fcp_cmd_rsp_buf *tmp = NULL;
>   
> -	cpu = raw_smp_processor_id();
> +	cpu = min_t(u32, raw_smp_processor_id(),
> +		    phba->sli4_hba.num_possible_cpu);
>   	if (cmnd && phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) {
>   		tag = blk_mq_unique_tag(cmnd->request);
>   		idx = blk_mq_unique_tag_to_hwq(tag);

This should be unnecessary with the lpfc 12.6.0.1 and 12.6.0.2 patches 
that tie into cpu onling/offlining and cpu hot add.

I am curious, how if a cpu is hot removed - how num_possible dynamically 
changes (to a lower value ?) such that a thread can be running on a cpu 
that returns a higher cpu number than num_possible ?

-- james
Hannes Reinecke Nov. 18, 2019, 8:21 p.m. UTC | #6
On 11/18/19 8:28 PM, James Smart wrote:
> On 11/18/2019 4:30 AM, Hannes Reinecke wrote:
>> The lpfc driver allocates a cpu_map based on the number of possible
>> cpus during startup. If a CPU hotplug occurs the number of CPUs
>> might change, causing an out-of-bounds access when trying to lookup
>> the hardware index for a given CPU.
>>
>> Suggested-by: Daniel Wagner <daniel.wagner@suse.com>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/scsi/lpfc/lpfc_scsi.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/lpfc/lpfc_scsi.c 
>> b/drivers/scsi/lpfc/lpfc_scsi.c
>> index ba26df90a36a..2380452a8efd 100644
>> --- a/drivers/scsi/lpfc/lpfc_scsi.c
>> +++ b/drivers/scsi/lpfc/lpfc_scsi.c
>> @@ -642,7 +642,8 @@ lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, struct 
>> lpfc_nodelist *ndlp,
>>       int tag;
>>       struct fcp_cmd_rsp_buf *tmp = NULL;
>> -    cpu = raw_smp_processor_id();
>> +    cpu = min_t(u32, raw_smp_processor_id(),
>> +            phba->sli4_hba.num_possible_cpu);
>>       if (cmnd && phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) {
>>           tag = blk_mq_unique_tag(cmnd->request);
>>           idx = blk_mq_unique_tag_to_hwq(tag);
> 
> This should be unnecessary with the lpfc 12.6.0.1 and 12.6.0.2 patches 
> that tie into cpu onling/offlining and cpu hot add.
> 
> I am curious, how if a cpu is hot removed - how num_possible dynamically 
> changes (to a lower value ?) such that a thread can be running on a cpu 
> that returns a higher cpu number than num_possible ?
> 
Actually, the behaviour of num_possible_cpus() under cpu hotplug seems 
to be implementation-specific.
One might want to set it to the max value at all times, which has the 
drawback that you'd need to scale per-cpu values with that number, 
leading to a massive memory overhead as only very few installation will 
ever reach that number.
Others might want to set to the max value at the current configuration, 
implying that it might change under CPU hotplug.
Which seems to be the case for PowerPC, which was the case which 
triggered the issue at hand.
But maybe it was a plain bug in the CPU hotplug implementation; one 
should be asking BenH for details here.

Cheers,

Hannes
James Smart Nov. 19, 2019, 5:17 p.m. UTC | #7
On 11/18/2019 12:21 PM, Hannes Reinecke wrote:
> On 11/18/19 8:28 PM, James Smart wrote:
>> On 11/18/2019 4:30 AM, Hannes Reinecke wrote:
>>> The lpfc driver allocates a cpu_map based on the number of possible
>>> cpus during startup. If a CPU hotplug occurs the number of CPUs
>>> might change, causing an out-of-bounds access when trying to lookup
>>> the hardware index for a given CPU.
>>>
>>> Suggested-by: Daniel Wagner <daniel.wagner@suse.com>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> ---
>>>   drivers/scsi/lpfc/lpfc_scsi.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/scsi/lpfc/lpfc_scsi.c 
>>> b/drivers/scsi/lpfc/lpfc_scsi.c
>>> index ba26df90a36a..2380452a8efd 100644
>>> --- a/drivers/scsi/lpfc/lpfc_scsi.c
>>> +++ b/drivers/scsi/lpfc/lpfc_scsi.c
>>> @@ -642,7 +642,8 @@ lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, 
>>> struct lpfc_nodelist *ndlp,
>>>       int tag;
>>>       struct fcp_cmd_rsp_buf *tmp = NULL;
>>> -    cpu = raw_smp_processor_id();
>>> +    cpu = min_t(u32, raw_smp_processor_id(),
>>> +            phba->sli4_hba.num_possible_cpu);
>>>       if (cmnd && phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) {
>>>           tag = blk_mq_unique_tag(cmnd->request);
>>>           idx = blk_mq_unique_tag_to_hwq(tag);
>>
>> This should be unnecessary with the lpfc 12.6.0.1 and 12.6.0.2 
>> patches that tie into cpu onling/offlining and cpu hot add.
>>
>> I am curious, how if a cpu is hot removed - how num_possible 
>> dynamically changes (to a lower value ?) such that a thread can be 
>> running on a cpu that returns a higher cpu number than num_possible ?
>>
> Actually, the behaviour of num_possible_cpus() under cpu hotplug seems 
> to be implementation-specific.
> One might want to set it to the max value at all times, which has the 
> drawback that you'd need to scale per-cpu values with that number, 
> leading to a massive memory overhead as only very few installation 
> will ever reach that number.
> Others might want to set to the max value at the current 
> configuration, implying that it might change under CPU hotplug.
> Which seems to be the case for PowerPC, which was the case which 
> triggered the issue at hand.
> But maybe it was a plain bug in the CPU hotplug implementation; one 
> should be asking BenH for details here.
>
> Cheers,
>
> Hannes

That sure seems to be at odds with this comment from 
include/linux/cpumask.h:

  *  The cpu_possible_mask is fixed at boot time, as the set of CPU id's
  *  that it is possible might ever be plugged in at anytime during the
  *  life of that system boot.  The cpu_present_mask is dynamic(*),
  *  representing which CPUs are currently plugged in.  And
  *  cpu_online_mask is the dynamic subset of cpu_present_mask,
  *  indicating those CPUs available for scheduling.

and
#define num_possible_cpus()     cpumask_weight(cpu_possible_mask)


-- james
James Smart Nov. 19, 2019, 5:34 p.m. UTC | #8
On 11/19/2019 9:17 AM, James Smart wrote:
>
>
> On 11/18/2019 12:21 PM, Hannes Reinecke wrote:
>> On 11/18/19 8:28 PM, James Smart wrote:
>>> On 11/18/2019 4:30 AM, Hannes Reinecke wrote:
>>>> The lpfc driver allocates a cpu_map based on the number of possible
>>>> cpus during startup. If a CPU hotplug occurs the number of CPUs
>>>> might change, causing an out-of-bounds access when trying to lookup
>>>> the hardware index for a given CPU.
>>>>
>>>> Suggested-by: Daniel Wagner <daniel.wagner@suse.com>
>>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>>> ---
>>>>   drivers/scsi/lpfc/lpfc_scsi.c | 3 ++-
>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/scsi/lpfc/lpfc_scsi.c 
>>>> b/drivers/scsi/lpfc/lpfc_scsi.c
>>>> index ba26df90a36a..2380452a8efd 100644
>>>> --- a/drivers/scsi/lpfc/lpfc_scsi.c
>>>> +++ b/drivers/scsi/lpfc/lpfc_scsi.c
>>>> @@ -642,7 +642,8 @@ lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, 
>>>> struct lpfc_nodelist *ndlp,
>>>>       int tag;
>>>>       struct fcp_cmd_rsp_buf *tmp = NULL;
>>>> -    cpu = raw_smp_processor_id();
>>>> +    cpu = min_t(u32, raw_smp_processor_id(),
>>>> +            phba->sli4_hba.num_possible_cpu);
>>>>       if (cmnd && phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) {
>>>>           tag = blk_mq_unique_tag(cmnd->request);
>>>>           idx = blk_mq_unique_tag_to_hwq(tag);
>>>
>>> This should be unnecessary with the lpfc 12.6.0.1 and 12.6.0.2 
>>> patches that tie into cpu onling/offlining and cpu hot add.
>>>
>>> I am curious, how if a cpu is hot removed - how num_possible 
>>> dynamically changes (to a lower value ?) such that a thread can be 
>>> running on a cpu that returns a higher cpu number than num_possible ?
>>>
>> Actually, the behaviour of num_possible_cpus() under cpu hotplug 
>> seems to be implementation-specific.
>> One might want to set it to the max value at all times, which has the 
>> drawback that you'd need to scale per-cpu values with that number, 
>> leading to a massive memory overhead as only very few installation 
>> will ever reach that number.
>> Others might want to set to the max value at the current 
>> configuration, implying that it might change under CPU hotplug.
>> Which seems to be the case for PowerPC, which was the case which 
>> triggered the issue at hand.
>> But maybe it was a plain bug in the CPU hotplug implementation; one 
>> should be asking BenH for details here.
>>
>> Cheers,
>>
>> Hannes
>
> That sure seems to be at odds with this comment from 
> include/linux/cpumask.h:
>
>  *  The cpu_possible_mask is fixed at boot time, as the set of CPU id's
>  *  that it is possible might ever be plugged in at anytime during the
>  *  life of that system boot.  The cpu_present_mask is dynamic(*),
>  *  representing which CPUs are currently plugged in.  And
>  *  cpu_online_mask is the dynamic subset of cpu_present_mask,
>  *  indicating those CPUs available for scheduling.
>
> and
> #define num_possible_cpus()     cpumask_weight(cpu_possible_mask)
>
>
> -- james
>

Although... num_possible_cpus() shouldn't change, but I guess a cpu id 
can be greater than the possible cpu count.

But there is this as well - also from include/linux/cpumask.h:
  *  If HOTPLUG is enabled, then cpu_possible_mask is forced to have
  *  all NR_CPUS bits set, otherwise it is just the set of CPUs that
  *  ACPI reports present at boot.

-- james
Benjamin Herrenschmidt Nov. 20, 2019, 3:45 a.m. UTC | #9
On Tue, 2019-11-19 at 09:17 -0800, James Smart wrote:
> 
> > Actually, the behaviour of num_possible_cpus() under cpu hotplug seems 
> > to be implementation-specific.
> > One might want to set it to the max value at all times, which has the 
> > drawback that you'd need to scale per-cpu values with that number, 
> > leading to a massive memory overhead as only very few installation 
> > will ever reach that number.
> > Others might want to set to the max value at the current 
> > configuration, implying that it might change under CPU hotplug.
> > Which seems to be the case for PowerPC, which was the case which 
> > triggered the issue at hand.
> > But maybe it was a plain bug in the CPU hotplug implementation; one 
> > should be asking BenH for details here.
> > 
> > Cheers,
> > 
> > Hannes
> 
> That sure seems to be at odds with this comment from 
> include/linux/cpumask.h:
> 
>   *  The cpu_possible_mask is fixed at boot time, as the set of CPU id's
>   *  that it is possible might ever be plugged in at anytime during the
>   *  life of that system boot.  The cpu_present_mask is dynamic(*),
>   *  representing which CPUs are currently plugged in.  And
>   *  cpu_online_mask is the dynamic subset of cpu_present_mask,
>   *  indicating those CPUs available for scheduling.
> 
> and
> #define num_possible_cpus()     cpumask_weight(cpu_possible_mask)

Right, it should be fixed at boot. Where do you see powerpc change it ?

Cheers,
Ben.
Benjamin Herrenschmidt Nov. 20, 2019, 3:46 a.m. UTC | #10
On Tue, 2019-11-19 at 09:34 -0800, James Smart wrote:
> Although... num_possible_cpus() shouldn't change, but I guess a cpu
> id 
> can be greater than the possible cpu count.
> 
> But there is this as well - also from include/linux/cpumask.h:
>   *  If HOTPLUG is enabled, then cpu_possible_mask is forced to have
>   *  all NR_CPUS bits set, otherwise it is just the set of CPUs that
>   *  ACPI reports present at boot.

That's probably an x86'ism... on powerpc the mask is set based on
what's really possible and it may not be all bits

Cheers,
Ben.
diff mbox series

Patch

diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index ba26df90a36a..2380452a8efd 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -642,7 +642,8 @@  lpfc_get_scsi_buf_s4(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp,
 	int tag;
 	struct fcp_cmd_rsp_buf *tmp = NULL;
 
-	cpu = raw_smp_processor_id();
+	cpu = min_t(u32, raw_smp_processor_id(),
+		    phba->sli4_hba.num_possible_cpu);
 	if (cmnd && phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) {
 		tag = blk_mq_unique_tag(cmnd->request);
 		idx = blk_mq_unique_tag_to_hwq(tag);