Message ID | 20210531050727.2353973-1-ming.lei@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | scsi: two fixes in scsi_add_host_with_dma | expand |
On 31/05/2021 06:07, Ming Lei wrote: > Hello Martin, > > > Fix two memory leaks and one double free issue in alloc/add host > code path. > > > V3: > - fix memory leak caused in scsi_host_alloc > - comment typo suggested by John > > V2: > - add patch 2 for addressing shost leak in case of adding host > failure, reported by John Garry. > > Ming Lei (3): > scsi: core: use put_device() to release host > scsi: core: fix failure handling of scsi_add_host_with_dma > scsi: core: put ->shost_gendev.parent in failure handling path > > drivers/scsi/hosts.c | 25 ++++++++++++------------- > 1 file changed, 12 insertions(+), 13 deletions(-) > > Cc: Bart Van Assche <bvanassche@acm.org> > Cc: John Garry <john.garry@huawei.com> > Cc: Hannes Reinecke <hare@suse.de> I tested this again with the same experiment as before to error in scsi_add_host_with_dma(), and we don't call scsi_host_dev_release() now. The shost_gendev dev refcount is 2 upon exit in scsi_add_host_with_dma(). We don't call scsi_host_cls_release() either, so I guess a ref count is leaked for shost_dev - I see its refcount is 1 at exit in scsi_add_host_with_dma(). We have the device_initialize(), device_add(), device_del() in the alloc and add host functions, but I don't know who is responsible for the final "device put". Thanks, John
On Tue, Jun 01, 2021 at 11:34:20AM +0100, John Garry wrote: > On 31/05/2021 06:07, Ming Lei wrote: > > Hello Martin, > > > > > > Fix two memory leaks and one double free issue in alloc/add host > > code path. > > > > > > V3: > > - fix memory leak caused in scsi_host_alloc > > - comment typo suggested by John > > > > V2: > > - add patch 2 for addressing shost leak in case of adding host > > failure, reported by John Garry. > > > > Ming Lei (3): > > scsi: core: use put_device() to release host > > scsi: core: fix failure handling of scsi_add_host_with_dma > > scsi: core: put ->shost_gendev.parent in failure handling path > > > > drivers/scsi/hosts.c | 25 ++++++++++++------------- > > 1 file changed, 12 insertions(+), 13 deletions(-) > > > > Cc: Bart Van Assche <bvanassche@acm.org> > > Cc: John Garry <john.garry@huawei.com> > > Cc: Hannes Reinecke <hare@suse.de> > > > I tested this again with the same experiment as before to error in > scsi_add_host_with_dma(), and we don't call scsi_host_dev_release() now. The > shost_gendev dev refcount is 2 upon exit in scsi_add_host_with_dma(). > > We don't call scsi_host_cls_release() either, so I guess a ref count is > leaked for shost_dev - I see its refcount is 1 at exit in > scsi_add_host_with_dma(). We have the device_initialize(), device_add(), > device_del() in the alloc and add host functions, but I don't know who is > responsible for the final "device put". Hammm, we still need to put ->shost_dev before returning the error, and the following delta patch can fix the issue, and it should have been wrapped into the 1st one. diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index 22a58e453a0c..532165462a42 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -306,6 +306,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, pm_runtime_set_suspended(&shost->shost_gendev); pm_runtime_put_noidle(&shost->shost_gendev); fail: + /* drop ref of ->shost_dev so that caller can release this host */ + put_device(&shost->shost_dev); return error; } EXPORT_SYMBOL(scsi_add_host_with_dma); Thanks, Ming
On 01/06/2021 14:11, Ming Lei wrote: >> We don't call scsi_host_cls_release() either, so I guess a ref count is >> leaked for shost_dev - I see its refcount is 1 at exit in >> scsi_add_host_with_dma(). We have the device_initialize(), device_add(), >> device_del() in the alloc and add host functions, but I don't know who is >> responsible for the final "device put". > Hammm, we still need to put ->shost_dev before returning the error, and the > following delta patch can fix the issue, and it should have been wrapped > into the 1st one. > > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c > index 22a58e453a0c..532165462a42 100644 > --- a/drivers/scsi/hosts.c > +++ b/drivers/scsi/hosts.c > @@ -306,6 +306,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, > pm_runtime_set_suspended(&shost->shost_gendev); > pm_runtime_put_noidle(&shost->shost_gendev); > fail: > + /* drop ref of ->shost_dev so that caller can release this host */ > + put_device(&shost->shost_dev); > return error; > } > EXPORT_SYMBOL(scsi_add_host_with_dma); That looks better now. And we can see the equivalent on the normal removal path in scsi_remove_host() -> device_unregister(&shost->shost_dev), which does a device_del()+put_device(). So could we actually just have: out_del_dev: unregister_dev(&shost->shost_dev) I am not sure if we are required to keep that shost_dev reference all the way until the exit, as you do. Thanks, John
On Tue, Jun 01, 2021 at 04:07:05PM +0100, John Garry wrote: > On 01/06/2021 14:11, Ming Lei wrote: > > > We don't call scsi_host_cls_release() either, so I guess a ref count is > > > leaked for shost_dev - I see its refcount is 1 at exit in > > > scsi_add_host_with_dma(). We have the device_initialize(), device_add(), > > > device_del() in the alloc and add host functions, but I don't know who is > > > responsible for the final "device put". > > Hammm, we still need to put ->shost_dev before returning the error, and the > > following delta patch can fix the issue, and it should have been wrapped > > into the 1st one. > > > > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c > > index 22a58e453a0c..532165462a42 100644 > > --- a/drivers/scsi/hosts.c > > +++ b/drivers/scsi/hosts.c > > @@ -306,6 +306,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, > > pm_runtime_set_suspended(&shost->shost_gendev); > > pm_runtime_put_noidle(&shost->shost_gendev); > > fail: > > + /* drop ref of ->shost_dev so that caller can release this host */ > > + put_device(&shost->shost_dev); > > return error; > > } > > EXPORT_SYMBOL(scsi_add_host_with_dma); > > That looks better now. > > And we can see the equivalent on the normal removal path in > scsi_remove_host() -> device_unregister(&shost->shost_dev), which does a > device_del()+put_device(). > > So could we actually just have: > out_del_dev: > unregister_dev(&shost->shost_dev) > No, we still have to call put_device(&shost->shost_dev) only in case of failure before adding &shost->shost_dev. > I am not sure if we are required to keep that shost_dev reference all the > way until the exit, as you do. That has been done in this way, the problem is that both .shost_dev and .shost_gendev share same lifetime and memory(same struct Scsi_Host instance), this kind of pattern isn't one usual driver core use case, and we have to handle it carefully. Thanks, Ming