diff mbox series

drm/amdgpu: Fix resource leak on probe error path

Message ID nycvar.YFH.7.76.2106241319430.18969@cbobk.fhfr.pm (mailing list archive)
State New, archived
Headers show
Series drm/amdgpu: Fix resource leak on probe error path | expand

Commit Message

Jiri Kosina June 24, 2021, 11:20 a.m. UTC
From: Jiri Kosina <jkosina@suse.cz>

This reverts commit 4192f7b5768912ceda82be2f83c87ea7181f9980.

It is not true (as stated in the reverted commit changelog) that we never 
unmap the BAR on failure; it actually does happen properly on 
amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() -> 
amdgpu_device_fini() error path.

What's worse, this commit actually completely breaks resource freeing on 
probe failure (like e.g. failure to load microcode), as 
amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too 
early, leaving all the resources that'd normally be freed in 
amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading 
to all sorts of oopses when someone tries to, for example, access the 
sysfs and procfs resources which are still around while the driver is 
gone.

Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure")
Reported-by: Vojtech Pavlik <vojtech@ucw.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Comments

Jiri Kosina July 1, 2021, 8:32 a.m. UTC | #1
On Thu, 24 Jun 2021, Jiri Kosina wrote:

> From: Jiri Kosina <jkosina@suse.cz>
> 
> This reverts commit 4192f7b5768912ceda82be2f83c87ea7181f9980.
> 
> It is not true (as stated in the reverted commit changelog) that we never 
> unmap the BAR on failure; it actually does happen properly on 
> amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() -> 
> amdgpu_device_fini() error path.
> 
> What's worse, this commit actually completely breaks resource freeing on 
> probe failure (like e.g. failure to load microcode), as 
> amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too 
> early, leaving all the resources that'd normally be freed in 
> amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading 
> to all sorts of oopses when someone tries to, for example, access the 
> sysfs and procfs resources which are still around while the driver is 
> gone.
> 
> Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure")
> Reported-by: Vojtech Pavlik <vojtech@ucw.cz>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>

Friendly ping on this one (as it's easily triggerable in the wild by just 
missing proper firwmare).

Thanks.

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 57ec108b5972..0f1c0e17a587 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3414,13 +3414,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>  	r = amdgpu_device_get_job_timeout_settings(adev);
>  	if (r) {
>  		dev_err(adev->dev, "invalid lockup_timeout parameter syntax\n");
> -		goto failed_unmap;
> +		return r;
>  	}
>  
>  	/* early init functions */
>  	r = amdgpu_device_ip_early_init(adev);
>  	if (r)
> -		goto failed_unmap;
> +		return r;
>  
>  	/* doorbell bar mapping and doorbell index init*/
>  	amdgpu_device_doorbell_init(adev);
> @@ -3646,10 +3646,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>  failed:
>  	amdgpu_vf_error_trans_all(adev);
>  
> -failed_unmap:
> -	iounmap(adev->rmmio);
> -	adev->rmmio = NULL;
> -
>  	return r;
>  }
>  
> -- 
> 2.12.3
> 
>
Alex Deucher July 1, 2021, 2:07 p.m. UTC | #2
Applied.  Thanks!

Alex

On Thu, Jul 1, 2021 at 4:32 AM Jiri Kosina <jikos@kernel.org> wrote:
>
> On Thu, 24 Jun 2021, Jiri Kosina wrote:
>
> > From: Jiri Kosina <jkosina@suse.cz>
> >
> > This reverts commit 4192f7b5768912ceda82be2f83c87ea7181f9980.
> >
> > It is not true (as stated in the reverted commit changelog) that we never
> > unmap the BAR on failure; it actually does happen properly on
> > amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() ->
> > amdgpu_device_fini() error path.
> >
> > What's worse, this commit actually completely breaks resource freeing on
> > probe failure (like e.g. failure to load microcode), as
> > amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too
> > early, leaving all the resources that'd normally be freed in
> > amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading
> > to all sorts of oopses when someone tries to, for example, access the
> > sysfs and procfs resources which are still around while the driver is
> > gone.
> >
> > Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure")
> > Reported-by: Vojtech Pavlik <vojtech@ucw.cz>
> > Signed-off-by: Jiri Kosina <jkosina@suse.cz>
>
> Friendly ping on this one (as it's easily triggerable in the wild by just
> missing proper firwmare).
>
> Thanks.
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++------
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 57ec108b5972..0f1c0e17a587 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3414,13 +3414,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> >       r = amdgpu_device_get_job_timeout_settings(adev);
> >       if (r) {
> >               dev_err(adev->dev, "invalid lockup_timeout parameter syntax\n");
> > -             goto failed_unmap;
> > +             return r;
> >       }
> >
> >       /* early init functions */
> >       r = amdgpu_device_ip_early_init(adev);
> >       if (r)
> > -             goto failed_unmap;
> > +             return r;
> >
> >       /* doorbell bar mapping and doorbell index init*/
> >       amdgpu_device_doorbell_init(adev);
> > @@ -3646,10 +3646,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> >  failed:
> >       amdgpu_vf_error_trans_all(adev);
> >
> > -failed_unmap:
> > -     iounmap(adev->rmmio);
> > -     adev->rmmio = NULL;
> > -
> >       return r;
> >  }
> >
> > --
> > 2.12.3
> >
> >
>
> --
> Jiri Kosina
> SUSE Labs
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 57ec108b5972..0f1c0e17a587 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3414,13 +3414,13 @@  int amdgpu_device_init(struct amdgpu_device *adev,
 	r = amdgpu_device_get_job_timeout_settings(adev);
 	if (r) {
 		dev_err(adev->dev, "invalid lockup_timeout parameter syntax\n");
-		goto failed_unmap;
+		return r;
 	}
 
 	/* early init functions */
 	r = amdgpu_device_ip_early_init(adev);
 	if (r)
-		goto failed_unmap;
+		return r;
 
 	/* doorbell bar mapping and doorbell index init*/
 	amdgpu_device_doorbell_init(adev);
@@ -3646,10 +3646,6 @@  int amdgpu_device_init(struct amdgpu_device *adev,
 failed:
 	amdgpu_vf_error_trans_all(adev);
 
-failed_unmap:
-	iounmap(adev->rmmio);
-	adev->rmmio = NULL;
-
 	return r;
 }