Patchwork drm/nouveau/bar/gk20a: Avoid bar teardown during init

login
register
mail settings
Submitter Jon Hunter
Date Jan. 4, 2018, 11:29 a.m.
Message ID <1515065349-10187-1-git-send-email-jonathanh@nvidia.com>
Download mbox | patch
Permalink /patch/10144545/
State New
Headers show

Comments

Jon Hunter - Jan. 4, 2018, 11:29 a.m.
Commit bbb163e18960 ("drm/nouveau/bar: implement bar1 teardown")
introduced add a teardown helper function for BAR1. During
initialisation of the Nouveau, initially all the teardown helpers are
called once, before calling their init counterparts. For gk20a, after
the BAR1 teardown function is called, the device is hanging during the
initialisation of the FB sub-device. At this point it is unclear why
this is happening and this is still under investigation. However, this
change is preventing Tegra124 devices from booting when Nouveau is
enabled. To allow Tegra124 to boot, remove the teardown helper for
gk20a.

This is based upon a previous patch by Guillaume Tucker but limits
the workaround to only gk20a GPUs.

Fixes: bbb163e18960 ("drm/nouveau/bar: implement bar1 teardown")
Reported-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
---
I am not happy that we do not yet fully understand the cause of
the hang, but I am talking with a few people at NVIDIA about this
and have a few things to look into. However, given that we are
close to v4.15 being released and I am not sure we will have a
proper fix in place before, I think it is best to workaround
this for now.

 drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c  | 3 ++-
 drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c | 1 -
 2 files changed, 2 insertions(+), 2 deletions(-)
Thierry Reding - Jan. 10, 2018, 12:20 p.m.
On Thu, Jan 04, 2018 at 11:29:09AM +0000, Jon Hunter wrote:
> Commit bbb163e18960 ("drm/nouveau/bar: implement bar1 teardown")
> introduced add a teardown helper function for BAR1. During
> initialisation of the Nouveau, initially all the teardown helpers are
> called once, before calling their init counterparts. For gk20a, after
> the BAR1 teardown function is called, the device is hanging during the
> initialisation of the FB sub-device. At this point it is unclear why
> this is happening and this is still under investigation. However, this
> change is preventing Tegra124 devices from booting when Nouveau is
> enabled. To allow Tegra124 to boot, remove the teardown helper for
> gk20a.
> 
> This is based upon a previous patch by Guillaume Tucker but limits
> the workaround to only gk20a GPUs.
> 
> Fixes: bbb163e18960 ("drm/nouveau/bar: implement bar1 teardown")
> Reported-by: Guillaume Tucker <guillaume.tucker@collabora.com>
> Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
> ---
> I am not happy that we do not yet fully understand the cause of
> the hang, but I am talking with a few people at NVIDIA about this
> and have a few things to look into. However, given that we are
> close to v4.15 being released and I am not sure we will have a
> proper fix in place before, I think it is best to workaround
> this for now.
> 
>  drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c  | 3 ++-
>  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c | 1 -
>  2 files changed, 2 insertions(+), 2 deletions(-)

Acked-by: Thierry Reding <treding@nvidia.com>
Guillaume Tucker - Jan. 17, 2018, 9:27 a.m.
On 10/01/18 12:20, Thierry Reding wrote:
> On Thu, Jan 04, 2018 at 11:29:09AM +0000, Jon Hunter wrote:
>> Commit bbb163e18960 ("drm/nouveau/bar: implement bar1 teardown")
>> introduced add a teardown helper function for BAR1. During
>> initialisation of the Nouveau, initially all the teardown helpers are
>> called once, before calling their init counterparts. For gk20a, after
>> the BAR1 teardown function is called, the device is hanging during the
>> initialisation of the FB sub-device. At this point it is unclear why
>> this is happening and this is still under investigation. However, this
>> change is preventing Tegra124 devices from booting when Nouveau is
>> enabled. To allow Tegra124 to boot, remove the teardown helper for
>> gk20a.
>>
>> This is based upon a previous patch by Guillaume Tucker but limits
>> the workaround to only gk20a GPUs.
>>
>> Fixes: bbb163e18960 ("drm/nouveau/bar: implement bar1 teardown")
>> Reported-by: Guillaume Tucker <guillaume.tucker@collabora.com>
>> Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
>> ---
>> I am not happy that we do not yet fully understand the cause of
>> the hang, but I am talking with a few people at NVIDIA about this
>> and have a few things to look into. However, given that we are
>> close to v4.15 being released and I am not sure we will have a
>> proper fix in place before, I think it is best to workaround
>> this for now.
>>
>>   drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c  | 3 ++-
>>   drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c | 1 -
>>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> Acked-by: Thierry Reding <treding@nvidia.com>

Tested-by: Guillaume Tucker <guillaume.tucker@collabora.com>

   https://lava.collabora.co.uk/scheduler/job/1047172

Thanks for this workaround.  Looking forward to having this
platform back on track in mainline.  I'm happy to run this boot
test again with a proper fix in future patches, let me know if I
can be of any help.

Guillaume

Patch

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
index 9646adec57cb..243f0a5c8a62 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
@@ -73,7 +73,8 @@  static int
 nvkm_bar_fini(struct nvkm_subdev *subdev, bool suspend)
 {
 	struct nvkm_bar *bar = nvkm_bar(subdev);
-	bar->func->bar1.fini(bar);
+	if (bar->func->bar1.fini)
+		bar->func->bar1.fini(bar);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
index b10077d38839..35878fb538f2 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
@@ -26,7 +26,6 @@  gk20a_bar_func = {
 	.dtor = gf100_bar_dtor,
 	.oneinit = gf100_bar_oneinit,
 	.bar1.init = gf100_bar_bar1_init,
-	.bar1.fini = gf100_bar_bar1_fini,
 	.bar1.wait = gf100_bar_bar1_wait,
 	.bar1.vmm = gf100_bar_bar1_vmm,
 	.flush = g84_bar_flush,