diff mbox

scsi: qla2xxx: Fix small memory leak in qla2x00_probe_one on probe failure

Message ID 1521815845-28294-1-git-send-email-William.Kuzeja@stratus.com (mailing list archive)
State Accepted
Headers show

Commit Message

Bill Kuzeja March 23, 2018, 2:37 p.m. UTC
The code that fixes the crashes in the following commit introduced a
small memory leak:

commit 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure")

Fixing this requires a bit of reworking, which I've explained. Also provide
some code cleanup.

Fixes: 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure")
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>

---

There is a small window in qla2x00_probe_one where if qla2x00_alloc_queues
fails, we end up never freeing req and rsp and leak 0xc0 and 0xc8 bytes
respectively (the sizes of req and rsp).

I originally put in checks to test for this condition which were based on
the incorrect assumption that if ha->rsp_q_map and ha->req_q_map were
allocated, then rsp and req were allocated as well. This is incorrect.
There is a window between these allocations:

       ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp);
                goto probe_hw_failed;

[if successful, both rsp and req allocated]

       base_vha = qla2x00_create_host(sht, ha);
                goto probe_hw_failed;

       ret = qla2x00_request_irqs(ha, rsp);
                goto probe_failed;

       if (qla2x00_alloc_queues(ha, req, rsp)) {
                goto probe_failed;

[if successful, now ha->rsp_q_map and ha->req_q_map allocated]

To simplify this, we should just set req and rsp to NULL after we free
them. Sounds simple enough? The problem is that req and rsp are pointers
defined in the qla2x00_probe_one and they are not always passed by
reference to the routines that free them.

Here are paths which can free req and rsp:

PATH 1:
qla2x00_probe_one
   ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp);
   [req and rsp are passed by reference, but if this fails, we currently
    do not NULL out req and rsp. Easily fixed]

PATH 2:
qla2x00_probe_one
   failing in qla2x00_request_irqs or qla2x00_alloc_queues
      probe_failed:
         qla2x00_free_device(base_vha);
            qla2x00_free_req_que(ha, req)
            qla2x00_free_rsp_que(ha, rsp)

PATH 3:
qla2x00_probe_one:
   failing in qla2x00_mem_alloc or qla2x00_create_host
      probe_hw_failed:
         qla2x00_free_req_que(ha, req)
         qla2x00_free_rsp_que(ha, rsp)

PATH 1: This should currently work, but it doesn't because rsp and rsp
are not set to NULL in qla2x00_mem_alloc. Easily remedied.

PATH 2: req and rsp aren't passed in at all to qla2x00_free_device but are
derived from ha->req_q_map[0] and ha->rsp_q_map[0]. These are only set up
if qla2x00_alloc_queues succeeds.

In qla2x00_free_queues, we are protected from crashing if these don't
exist because req_qid_map and rsp_qid_map are only set on their
allocation. We are guarded in this way:

        for (cnt = 0; cnt < ha->max_req_queues; cnt++) {
                if (!test_bit(cnt, ha->req_qid_map))
                        continue;

PATH 3: This works. We haven't freed req or rsp yet (or they were never
allocated if qla2x00_mem_alloc failed), so we'll attempt to free them
here.

To summarize, there are a few small changes to make this work correctly and
(and for some cleanup):

1) (For PATH 1) Set *rsp and *req to NULL in case of failure in
qla2x00_mem_alloc so these are correctly set to NULL back in
qla2x00_probe_one

2) After jumping to probe_failed: and calling qla2x00_free_device,
explicitly set rsp and req to NULL so further calls with these pointers
do not crash, i.e. the free queue calls in the probe_hw_failed section
we fall through to.

3) Fix return code check in the call to qla2x00_alloc_queues. We currently
drop the return code on the floor. The probe fails but the caller of the
probe doesn't have an error code, so it attaches to pci. This can result
in a crash on module shutdown.

4) Remove unnecessary NULL checks in qla2x00_free_req_que,
qla2x00_free_rsp_que, and the egregious NULL checks before kfrees and
vfrees in qla2x00_mem_free.

I tested this out running a scenario where the card breaks at various
times during initialization. I made sure I forced every error exit path
in qla2x00_probe_one.

---
 drivers/scsi/qla2xxx/qla_os.c | 44 +++++++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 23 deletions(-)

Comments

Martin K. Petersen March 28, 2018, 9:45 p.m. UTC | #1
> The code that fixes the crashes in the following commit introduced a
> small memory leak:
>
> commit 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure")
>
> Fixing this requires a bit of reworking, which I've explained. Also provide
> some code cleanup.
>
> Fixes: 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure")
> Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>

Himanshu?
Madhani, Himanshu March 28, 2018, 9:47 p.m. UTC | #2
> On Mar 28, 2018, at 2:45 PM, Martin K. Petersen <martin.petersen@oracle.com> wrote:
> 
> 
>> The code that fixes the crashes in the following commit introduced a
>> small memory leak:
>> 
>> commit 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure")
>> 
>> Fixing this requires a bit of reworking, which I've explained. Also provide
>> some code cleanup.
>> 
>> Fixes: 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure")
>> Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
> 
> Himanshu?
> 

Still reviewing this one. Will respond soon 

> -- 
> Martin K. Petersen	Oracle Linux Engineering

Thanks,
- Himanshu
Madhani, Himanshu March 29, 2018, 5:24 p.m. UTC | #3
Hi Bill, 

> On Mar 23, 2018, at 7:37 AM, Bill Kuzeja <william.kuzeja@stratus.com> wrote:
> 
> The code that fixes the crashes in the following commit introduced a
> small memory leak:
> 
> commit 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure")
> 
> Fixing this requires a bit of reworking, which I've explained. Also provide
> some code cleanup.
> 
> Fixes: 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure")
> Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
> 
> ---
> 
> There is a small window in qla2x00_probe_one where if qla2x00_alloc_queues
> fails, we end up never freeing req and rsp and leak 0xc0 and 0xc8 bytes
> respectively (the sizes of req and rsp).
> 
> I originally put in checks to test for this condition which were based on
> the incorrect assumption that if ha->rsp_q_map and ha->req_q_map were
> allocated, then rsp and req were allocated as well. This is incorrect.
> There is a window between these allocations:
> 
>       ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp);
>                goto probe_hw_failed;
> 
> [if successful, both rsp and req allocated]
> 
>       base_vha = qla2x00_create_host(sht, ha);
>                goto probe_hw_failed;
> 
>       ret = qla2x00_request_irqs(ha, rsp);
>                goto probe_failed;
> 
>       if (qla2x00_alloc_queues(ha, req, rsp)) {
>                goto probe_failed;
> 
> [if successful, now ha->rsp_q_map and ha->req_q_map allocated]
> 
> To simplify this, we should just set req and rsp to NULL after we free
> them. Sounds simple enough? The problem is that req and rsp are pointers
> defined in the qla2x00_probe_one and they are not always passed by
> reference to the routines that free them.
> 
> Here are paths which can free req and rsp:
> 
> PATH 1:
> qla2x00_probe_one
>   ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp);
>   [req and rsp are passed by reference, but if this fails, we currently
>    do not NULL out req and rsp. Easily fixed]
> 
> PATH 2:
> qla2x00_probe_one
>   failing in qla2x00_request_irqs or qla2x00_alloc_queues
>      probe_failed:
>         qla2x00_free_device(base_vha);
>            qla2x00_free_req_que(ha, req)
>            qla2x00_free_rsp_que(ha, rsp)
> 
> PATH 3:
> qla2x00_probe_one:
>   failing in qla2x00_mem_alloc or qla2x00_create_host
>      probe_hw_failed:
>         qla2x00_free_req_que(ha, req)
>         qla2x00_free_rsp_que(ha, rsp)
> 
> PATH 1: This should currently work, but it doesn't because rsp and rsp
> are not set to NULL in qla2x00_mem_alloc. Easily remedied.
> 
> PATH 2: req and rsp aren't passed in at all to qla2x00_free_device but are
> derived from ha->req_q_map[0] and ha->rsp_q_map[0]. These are only set up
> if qla2x00_alloc_queues succeeds.
> 
> In qla2x00_free_queues, we are protected from crashing if these don't
> exist because req_qid_map and rsp_qid_map are only set on their
> allocation. We are guarded in this way:
> 
>        for (cnt = 0; cnt < ha->max_req_queues; cnt++) {
>                if (!test_bit(cnt, ha->req_qid_map))
>                        continue;
> 
> PATH 3: This works. We haven't freed req or rsp yet (or they were never
> allocated if qla2x00_mem_alloc failed), so we'll attempt to free them
> here.
> 
> To summarize, there are a few small changes to make this work correctly and
> (and for some cleanup):
> 
> 1) (For PATH 1) Set *rsp and *req to NULL in case of failure in
> qla2x00_mem_alloc so these are correctly set to NULL back in
> qla2x00_probe_one
> 
> 2) After jumping to probe_failed: and calling qla2x00_free_device,
> explicitly set rsp and req to NULL so further calls with these pointers
> do not crash, i.e. the free queue calls in the probe_hw_failed section
> we fall through to.
> 
> 3) Fix return code check in the call to qla2x00_alloc_queues. We currently
> drop the return code on the floor. The probe fails but the caller of the
> probe doesn't have an error code, so it attaches to pci. This can result
> in a crash on module shutdown.
> 
> 4) Remove unnecessary NULL checks in qla2x00_free_req_que,
> qla2x00_free_rsp_que, and the egregious NULL checks before kfrees and
> vfrees in qla2x00_mem_free.
> 
> I tested this out running a scenario where the card breaks at various
> times during initialization. I made sure I forced every error exit path
> in qla2x00_probe_one.
> 
> ---
> drivers/scsi/qla2xxx/qla_os.c | 44 +++++++++++++++++++++----------------------
> 1 file changed, 21 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 5c5dcca4..f70d7eb 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -471,9 +471,6 @@ static int qla2x00_alloc_queues(struct qla_hw_data *ha, struct req_que *req,
> 
> static void qla2x00_free_req_que(struct qla_hw_data *ha, struct req_que *req)
> {
> -	if (!ha->req_q_map)
> -		return;
> -
> 	if (IS_QLAFX00(ha)) {
> 		if (req && req->ring_fx00)
> 			dma_free_coherent(&ha->pdev->dev,
> @@ -484,17 +481,14 @@ static void qla2x00_free_req_que(struct qla_hw_data *ha, struct req_que *req)
> 		(req->length + 1) * sizeof(request_t),
> 		req->ring, req->dma);
> 
> -	if (req) {
> +	if (req)
> 		kfree(req->outstanding_cmds);
> -		kfree(req);
> -	}
> +
> +	kfree(req);
> }
> 
> static void qla2x00_free_rsp_que(struct qla_hw_data *ha, struct rsp_que *rsp)
> {
> -	if (!ha->rsp_q_map)
> -		return;
> -
> 	if (IS_QLAFX00(ha)) {
> 		if (rsp && rsp->ring)
> 			dma_free_coherent(&ha->pdev->dev,
> @@ -505,8 +499,7 @@ static void qla2x00_free_rsp_que(struct qla_hw_data *ha, struct rsp_que *rsp)
> 		(rsp->length + 1) * sizeof(response_t),
> 		rsp->ring, rsp->dma);
> 	}
> -	if (rsp)
> -		kfree(rsp);
> +	kfree(rsp);
> }
> 
> static void qla2x00_free_queues(struct qla_hw_data *ha)
> @@ -3107,7 +3100,8 @@ static void qla2x00_iocb_work_fn(struct work_struct *work)
> 		goto probe_failed;
> 
> 	/* Alloc arrays of request and response ring ptrs */
> -	if (qla2x00_alloc_queues(ha, req, rsp)) {
> +	ret = qla2x00_alloc_queues(ha, req, rsp);
> +	if (ret) {
> 		ql_log(ql_log_fatal, base_vha, 0x003d,
> 		    "Failed to allocate memory for queue pointers..."
> 		    "aborting.\n");
> @@ -3408,8 +3402,15 @@ static void qla2x00_iocb_work_fn(struct work_struct *work)
> 	}
> 
> 	qla2x00_free_device(base_vha);
> -
> 	scsi_host_put(base_vha->host);
> +	/*
> +	 * Need to NULL out local req/rsp after
> +	 * qla2x00_free_device => qla2x00_free_queues frees
> +	 * what these are pointing to. Or else we'll
> +	 * fall over below in qla2x00_free_req/rsp_que.
> +	 */
> +	req = NULL;
> +	rsp = NULL;
> 
> probe_hw_failed:
> 	qla2x00_mem_free(ha);
> @@ -4115,6 +4116,7 @@ void qla2x00_mark_device_lost(scsi_qla_host_t *vha, fc_port_t *fcport,
> 	(*rsp)->dma = 0;
> fail_rsp_ring:
> 	kfree(*rsp);
> +	*rsp = NULL;
> fail_rsp:
> 	dma_free_coherent(&ha->pdev->dev, ((*req)->length + 1) *
> 		sizeof(request_t), (*req)->ring, (*req)->dma);
> @@ -4122,6 +4124,7 @@ void qla2x00_mark_device_lost(scsi_qla_host_t *vha, fc_port_t *fcport,
> 	(*req)->dma = 0;
> fail_req_ring:
> 	kfree(*req);
> +	*req = NULL;
> fail_req:
> 	dma_free_coherent(&ha->pdev->dev, sizeof(struct ct_sns_pkt),
> 		ha->ct_sns, ha->ct_sns_dma);
> @@ -4509,16 +4512,11 @@ void qla2x00_mark_device_lost(scsi_qla_host_t *vha, fc_port_t *fcport,
> 		dma_free_coherent(&ha->pdev->dev, ha->init_cb_size,
> 			ha->init_cb, ha->init_cb_dma);
> 
> -	if (ha->optrom_buffer)
> -		vfree(ha->optrom_buffer);
> -	if (ha->nvram)
> -		kfree(ha->nvram);
> -	if (ha->npiv_info)
> -		kfree(ha->npiv_info);
> -	if (ha->swl)
> -		kfree(ha->swl);
> -	if (ha->loop_id_map)
> -		kfree(ha->loop_id_map);
> +	vfree(ha->optrom_buffer);
> +	kfree(ha->nvram);
> +	kfree(ha->npiv_info);
> +	kfree(ha->swl);
> +	kfree(ha->loop_id_map);
> 
> 	ha->srb_mempool = NULL;
> 	ha->ctx_mempool = NULL;
> -- 
> 1.8.3.1
> 

Thanks a lot for detailed explanation. Testing was good on this patch. 

Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>

Thanks,
- Himanshu
diff mbox

Patch

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 5c5dcca4..f70d7eb 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -471,9 +471,6 @@  static int qla2x00_alloc_queues(struct qla_hw_data *ha, struct req_que *req,
 
 static void qla2x00_free_req_que(struct qla_hw_data *ha, struct req_que *req)
 {
-	if (!ha->req_q_map)
-		return;
-
 	if (IS_QLAFX00(ha)) {
 		if (req && req->ring_fx00)
 			dma_free_coherent(&ha->pdev->dev,
@@ -484,17 +481,14 @@  static void qla2x00_free_req_que(struct qla_hw_data *ha, struct req_que *req)
 		(req->length + 1) * sizeof(request_t),
 		req->ring, req->dma);
 
-	if (req) {
+	if (req)
 		kfree(req->outstanding_cmds);
-		kfree(req);
-	}
+
+	kfree(req);
 }
 
 static void qla2x00_free_rsp_que(struct qla_hw_data *ha, struct rsp_que *rsp)
 {
-	if (!ha->rsp_q_map)
-		return;
-
 	if (IS_QLAFX00(ha)) {
 		if (rsp && rsp->ring)
 			dma_free_coherent(&ha->pdev->dev,
@@ -505,8 +499,7 @@  static void qla2x00_free_rsp_que(struct qla_hw_data *ha, struct rsp_que *rsp)
 		(rsp->length + 1) * sizeof(response_t),
 		rsp->ring, rsp->dma);
 	}
-	if (rsp)
-		kfree(rsp);
+	kfree(rsp);
 }
 
 static void qla2x00_free_queues(struct qla_hw_data *ha)
@@ -3107,7 +3100,8 @@  static void qla2x00_iocb_work_fn(struct work_struct *work)
 		goto probe_failed;
 
 	/* Alloc arrays of request and response ring ptrs */
-	if (qla2x00_alloc_queues(ha, req, rsp)) {
+	ret = qla2x00_alloc_queues(ha, req, rsp);
+	if (ret) {
 		ql_log(ql_log_fatal, base_vha, 0x003d,
 		    "Failed to allocate memory for queue pointers..."
 		    "aborting.\n");
@@ -3408,8 +3402,15 @@  static void qla2x00_iocb_work_fn(struct work_struct *work)
 	}
 
 	qla2x00_free_device(base_vha);
-
 	scsi_host_put(base_vha->host);
+	/*
+	 * Need to NULL out local req/rsp after
+	 * qla2x00_free_device => qla2x00_free_queues frees
+	 * what these are pointing to. Or else we'll
+	 * fall over below in qla2x00_free_req/rsp_que.
+	 */
+	req = NULL;
+	rsp = NULL;
 
 probe_hw_failed:
 	qla2x00_mem_free(ha);
@@ -4115,6 +4116,7 @@  void qla2x00_mark_device_lost(scsi_qla_host_t *vha, fc_port_t *fcport,
 	(*rsp)->dma = 0;
 fail_rsp_ring:
 	kfree(*rsp);
+	*rsp = NULL;
 fail_rsp:
 	dma_free_coherent(&ha->pdev->dev, ((*req)->length + 1) *
 		sizeof(request_t), (*req)->ring, (*req)->dma);
@@ -4122,6 +4124,7 @@  void qla2x00_mark_device_lost(scsi_qla_host_t *vha, fc_port_t *fcport,
 	(*req)->dma = 0;
 fail_req_ring:
 	kfree(*req);
+	*req = NULL;
 fail_req:
 	dma_free_coherent(&ha->pdev->dev, sizeof(struct ct_sns_pkt),
 		ha->ct_sns, ha->ct_sns_dma);
@@ -4509,16 +4512,11 @@  void qla2x00_mark_device_lost(scsi_qla_host_t *vha, fc_port_t *fcport,
 		dma_free_coherent(&ha->pdev->dev, ha->init_cb_size,
 			ha->init_cb, ha->init_cb_dma);
 
-	if (ha->optrom_buffer)
-		vfree(ha->optrom_buffer);
-	if (ha->nvram)
-		kfree(ha->nvram);
-	if (ha->npiv_info)
-		kfree(ha->npiv_info);
-	if (ha->swl)
-		kfree(ha->swl);
-	if (ha->loop_id_map)
-		kfree(ha->loop_id_map);
+	vfree(ha->optrom_buffer);
+	kfree(ha->nvram);
+	kfree(ha->npiv_info);
+	kfree(ha->swl);
+	kfree(ha->loop_id_map);
 
 	ha->srb_mempool = NULL;
 	ha->ctx_mempool = NULL;