mbox series

[V3,0/3] scsi: two fixes in scsi_add_host_with_dma

Message ID 20210531050727.2353973-1-ming.lei@redhat.com (mailing list archive)
Headers show
Series scsi: two fixes in scsi_add_host_with_dma | expand

Message

Ming Lei May 31, 2021, 5:07 a.m. UTC
Hello Martin,


Fix two memory leaks and one double free issue in alloc/add host
code path.


V3:
	- fix memory leak caused in scsi_host_alloc
	- comment typo suggested by John

V2:
	- add patch 2 for addressing shost leak in case of adding host
	  failure, reported by John Garry.

Ming Lei (3):
  scsi: core: use put_device() to release host
  scsi: core: fix failure handling of scsi_add_host_with_dma
  scsi: core: put ->shost_gendev.parent in failure handling path

 drivers/scsi/hosts.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Hannes Reinecke <hare@suse.de>

Comments

John Garry June 1, 2021, 10:34 a.m. UTC | #1
On 31/05/2021 06:07, Ming Lei wrote:
> Hello Martin,
> 
> 
> Fix two memory leaks and one double free issue in alloc/add host
> code path.
> 
> 
> V3:
> 	- fix memory leak caused in scsi_host_alloc
> 	- comment typo suggested by John
> 
> V2:
> 	- add patch 2 for addressing shost leak in case of adding host
> 	  failure, reported by John Garry.
> 
> Ming Lei (3):
>    scsi: core: use put_device() to release host
>    scsi: core: fix failure handling of scsi_add_host_with_dma
>    scsi: core: put ->shost_gendev.parent in failure handling path
> 
>   drivers/scsi/hosts.c | 25 ++++++++++++-------------
>   1 file changed, 12 insertions(+), 13 deletions(-)
> 
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Hannes Reinecke <hare@suse.de>


I tested this again with the same experiment as before to error in
scsi_add_host_with_dma(), and we don't call scsi_host_dev_release() now. 
The shost_gendev dev refcount is 2 upon exit in scsi_add_host_with_dma().

We don't call scsi_host_cls_release() either, so I guess a ref count is 
leaked for shost_dev - I see its refcount is 1 at exit in 
scsi_add_host_with_dma(). We have the device_initialize(), device_add(), 
device_del() in the alloc and add host functions, but I don't know who 
is responsible for the final "device put".

Thanks,
John
Ming Lei June 1, 2021, 1:11 p.m. UTC | #2
On Tue, Jun 01, 2021 at 11:34:20AM +0100, John Garry wrote:
> On 31/05/2021 06:07, Ming Lei wrote:
> > Hello Martin,
> > 
> > 
> > Fix two memory leaks and one double free issue in alloc/add host
> > code path.
> > 
> > 
> > V3:
> > 	- fix memory leak caused in scsi_host_alloc
> > 	- comment typo suggested by John
> > 
> > V2:
> > 	- add patch 2 for addressing shost leak in case of adding host
> > 	  failure, reported by John Garry.
> > 
> > Ming Lei (3):
> >    scsi: core: use put_device() to release host
> >    scsi: core: fix failure handling of scsi_add_host_with_dma
> >    scsi: core: put ->shost_gendev.parent in failure handling path
> > 
> >   drivers/scsi/hosts.c | 25 ++++++++++++-------------
> >   1 file changed, 12 insertions(+), 13 deletions(-)
> > 
> > Cc: Bart Van Assche <bvanassche@acm.org>
> > Cc: John Garry <john.garry@huawei.com>
> > Cc: Hannes Reinecke <hare@suse.de>
> 
> 
> I tested this again with the same experiment as before to error in
> scsi_add_host_with_dma(), and we don't call scsi_host_dev_release() now. The
> shost_gendev dev refcount is 2 upon exit in scsi_add_host_with_dma().
> 
> We don't call scsi_host_cls_release() either, so I guess a ref count is
> leaked for shost_dev - I see its refcount is 1 at exit in
> scsi_add_host_with_dma(). We have the device_initialize(), device_add(),
> device_del() in the alloc and add host functions, but I don't know who is
> responsible for the final "device put".

Hammm, we still need to put ->shost_dev before returning the error, and the
following delta patch can fix the issue, and it should have been wrapped
into the 1st one.

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 22a58e453a0c..532165462a42 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -306,6 +306,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
 	pm_runtime_set_suspended(&shost->shost_gendev);
 	pm_runtime_put_noidle(&shost->shost_gendev);
  fail:
+	/* drop ref of ->shost_dev so that caller can release this host */
+	put_device(&shost->shost_dev);
 	return error;
 }
 EXPORT_SYMBOL(scsi_add_host_with_dma);


Thanks,
Ming
John Garry June 1, 2021, 3:07 p.m. UTC | #3
On 01/06/2021 14:11, Ming Lei wrote:
>> We don't call scsi_host_cls_release() either, so I guess a ref count is
>> leaked for shost_dev - I see its refcount is 1 at exit in
>> scsi_add_host_with_dma(). We have the device_initialize(), device_add(),
>> device_del() in the alloc and add host functions, but I don't know who is
>> responsible for the final "device put".
> Hammm, we still need to put ->shost_dev before returning the error, and the
> following delta patch can fix the issue, and it should have been wrapped
> into the 1st one.
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index 22a58e453a0c..532165462a42 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -306,6 +306,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
>   	pm_runtime_set_suspended(&shost->shost_gendev);
>   	pm_runtime_put_noidle(&shost->shost_gendev);
>    fail:
> +	/* drop ref of ->shost_dev so that caller can release this host */
> +	put_device(&shost->shost_dev);
>   	return error;
>   }
>   EXPORT_SYMBOL(scsi_add_host_with_dma);

That looks better now.

And we can see the equivalent on the normal removal path in 
scsi_remove_host() -> device_unregister(&shost->shost_dev), which does a 
device_del()+put_device().

So could we actually just have:
out_del_dev:
	unregister_dev(&shost->shost_dev)

I am not sure if we are required to keep that shost_dev reference all 
the way until the exit, as you do.

Thanks,
John
Ming Lei June 2, 2021, 1:55 a.m. UTC | #4
On Tue, Jun 01, 2021 at 04:07:05PM +0100, John Garry wrote:
> On 01/06/2021 14:11, Ming Lei wrote:
> > > We don't call scsi_host_cls_release() either, so I guess a ref count is
> > > leaked for shost_dev - I see its refcount is 1 at exit in
> > > scsi_add_host_with_dma(). We have the device_initialize(), device_add(),
> > > device_del() in the alloc and add host functions, but I don't know who is
> > > responsible for the final "device put".
> > Hammm, we still need to put ->shost_dev before returning the error, and the
> > following delta patch can fix the issue, and it should have been wrapped
> > into the 1st one.
> > 
> > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> > index 22a58e453a0c..532165462a42 100644
> > --- a/drivers/scsi/hosts.c
> > +++ b/drivers/scsi/hosts.c
> > @@ -306,6 +306,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
> >   	pm_runtime_set_suspended(&shost->shost_gendev);
> >   	pm_runtime_put_noidle(&shost->shost_gendev);
> >    fail:
> > +	/* drop ref of ->shost_dev so that caller can release this host */
> > +	put_device(&shost->shost_dev);
> >   	return error;
> >   }
> >   EXPORT_SYMBOL(scsi_add_host_with_dma);
> 
> That looks better now.
> 
> And we can see the equivalent on the normal removal path in
> scsi_remove_host() -> device_unregister(&shost->shost_dev), which does a
> device_del()+put_device().
> 
> So could we actually just have:
> out_del_dev:
> 	unregister_dev(&shost->shost_dev)
> 

No, we still have to call put_device(&shost->shost_dev) only in case of
failure before adding &shost->shost_dev.

> I am not sure if we are required to keep that shost_dev reference all the
> way until the exit, as you do.

That has been done in this way, the problem is that both .shost_dev and
.shost_gendev share same lifetime and memory(same struct Scsi_Host instance),
this kind of pattern isn't one usual driver core use case, and we have to
handle it carefully.


Thanks,
Ming