diff mbox

Revert "drm/nouveau/fifo/gk104: kick channels when deactivating them"

Message ID 1439353058-10001-1-git-send-email-acourbot@nvidia.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alexandre Courbot Aug. 12, 2015, 4:17 a.m. UTC
This reverts commit 1addc1264852

This commit seems to cause crashes in gk104_fifo_intr_runlist() by
returning 0xbad0da00 when register 0x2a00 is read. Since this commit was
intended for GM20B which is not completely supported yet, let's revert
it for the time being.

Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
David, it would be great if this could be merged for 4.2 since lots of
users could potentially experience this issue. Thanks!

 drivers/gpu/drm/nouveau/nvkm/engine/fifo/gk104.c | 29 +++++++-----------------
 1 file changed, 8 insertions(+), 21 deletions(-)

Comments

afzal mohammed Aug. 12, 2015, 6 a.m. UTC | #1
Hi,

On Wed, Aug 12, 2015 at 01:17:38PM +0900, Alexandre Courbot wrote:
> This reverts commit 1addc1264852
> 
> This commit seems to cause crashes in gk104_fifo_intr_runlist() by
> returning 0xbad0da00 when register 0x2a00 is read. Since this commit was
> intended for GM20B which is not completely supported yet, let's revert
> it for the time being.
> 
> Reported-by: Eric Biggers <ebiggers3@gmail.com>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> David, it would be great if this could be merged for 4.2 since lots of
> users could potentially experience this issue. Thanks!

Tested-by: Afzal Mohammed <afzal.mohd.ma@gmail.com>

Please help $subject reach mainline for 4.2, w/o this revert, the
system here hangs most (>90%) of the time at boot time.

As an aside, yesterday after a marathon git bisect, came to the same
solution (though I don't understand what that change means). Was
about to report it and saw this one. Thanks Alexandre.

W/o the revert, in the rare case where it boots, below is observed in
addition to as compared to w/ revert,

[    9.826010] nouveau E[    PBUS][0000:01:00.0] MMIO read of 0x00000000 FAULT at 0x122130 [ !ENGINE ]

Regards
Afzal
Alexandre Courbot Aug. 12, 2015, 7:12 a.m. UTC | #2
On Wed, Aug 12, 2015 at 3:00 PM, Afzal Mohammed <afzal.mohd.ma@gmail.com> wrote:
> Hi,
>
> On Wed, Aug 12, 2015 at 01:17:38PM +0900, Alexandre Courbot wrote:
>> This reverts commit 1addc1264852
>>
>> This commit seems to cause crashes in gk104_fifo_intr_runlist() by
>> returning 0xbad0da00 when register 0x2a00 is read. Since this commit was
>> intended for GM20B which is not completely supported yet, let's revert
>> it for the time being.
>>
>> Reported-by: Eric Biggers <ebiggers3@gmail.com>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>> David, it would be great if this could be merged for 4.2 since lots of
>> users could potentially experience this issue. Thanks!
>
> Tested-by: Afzal Mohammed <afzal.mohd.ma@gmail.com>

Thanks!

>
> Please help $subject reach mainline for 4.2, w/o this revert, the
> system here hangs most (>90%) of the time at boot time.
>
> As an aside, yesterday after a marathon git bisect, came to the same
> solution (though I don't understand what that change means). Was
> about to report it and saw this one. Thanks Alexandre.

All credit goes to Eric for bisecting and reporting this issue.

>
> W/o the revert, in the rare case where it boots, below is observed in
> addition to as compared to w/ revert,
>
> [    9.826010] nouveau E[    PBUS][0000:01:00.0] MMIO read of 0x00000000 FAULT at 0x122130 [ !ENGINE ]

Could you let me know what your card is? It may be useful to know the
range of affected cards when trying to fix this.

Thanks,
Alex.
afzal mohammed Aug. 12, 2015, 7:37 a.m. UTC | #3
Hi,

On Wed, Aug 12, 2015 at 04:12:15PM +0900, Alexandre Courbot wrote:

> Could you let me know what your card is? It may be useful to know the
> range of affected cards when trying to fix this.

grep of nouveau on dmesg as follows, if the following information is
not sufficient, let me know where the details you are asking for can
be found,

Regards
Afzal

 nouveau 0000:01:00.0: enabling device (0004 -> 0007)
 nouveau  [  DEVICE][0000:01:00.0] BOOT0  : 0x108120a1
 nouveau  [  DEVICE][0000:01:00.0] Chipset: GK208 (NV108)
 nouveau  [  DEVICE][0000:01:00.0] Family : NVE0
 nouveau  [   VBIOS][0000:01:00.0] using image from ACPI
 nouveau  [   VBIOS][0000:01:00.0] BIT signature found
 nouveau  [   VBIOS][0000:01:00.0] version 80.28.28.00.05
 nouveau  [ DEVINIT][0000:01:00.0] adaptor not initialised
 nouveau  [   VBIOS][0000:01:00.0] running init tables
 nouveau  [     PMC][0000:01:00.0] MSI interrupts enabled
 nouveau E[   PIBUS][0000:01:00.0] HUB0: 0x6013d4 0xffff5703 (0x1d708200)
 nouveau  [     PFB][0000:01:00.0] RAM type: DDR3
 nouveau  [     PFB][0000:01:00.0] RAM size: 2048 MiB
 nouveau  [     PFB][0000:01:00.0]    ZCOMP: 0 tags
 nouveau E[   PIBUS][0000:01:00.0] GPC0: 0x4188ac 0x00000001 (0x1a70822e)
 nouveau  [    VOLT][0000:01:00.0] GPU voltage: 600000uv
 nouveau  [  PTHERM][0000:01:00.0] FAN control: none / external
 nouveau  [  PTHERM][0000:01:00.0] fan management: automatic
 nouveau  [  PTHERM][0000:01:00.0] internal sensor: yes
 nouveau  [     CLK][0000:01:00.0] 07: core 405 MHz memory 810 MHz 
 nouveau  [     CLK][0000:01:00.0] 0a: core 405-1058 MHz memory 1620 MHz 
 nouveau  [     CLK][0000:01:00.0] 0f: core 405-1058 MHz memory 2002 MHz 
 nouveau  [     CLK][0000:01:00.0] --: core 405 MHz memory 810 MHz 
 nouveau  [     DRM] VRAM: 2048 MiB
 nouveau  [     DRM] GART: 1048576 MiB
 nouveau E[     DRM] Pointer to TMDS table invalid
 nouveau  [     DRM] DCB version 4.0
 nouveau E[     DRM] Pointer to flat panel table invalid
 nouveau  [     DRM] MM: using COPY for buffer copies
 [drm] Initialized nouveau 1.2.2 20120801 for 0000:01:00.0 on minor 1
 nouveau E[    PBUS][0000:01:00.0] MMIO read of 0x00000000 FAULT at 0x122130 [ !ENGINE ]
Alexandre Courbot Aug. 12, 2015, 7:40 a.m. UTC | #4
2015-08-12 16:37 GMT+09:00 Afzal Mohammed <afzal.mohd.ma@gmail.com>:
> Hi,
>
> On Wed, Aug 12, 2015 at 04:12:15PM +0900, Alexandre Courbot wrote:
>
>> Could you let me know what your card is? It may be useful to know the
>> range of affected cards when trying to fix this.
>
> grep of nouveau on dmesg as follows, if the following information is
> not sufficient, let me know where the details you are asking for can
> be found,

Great, thanks. Are you also on an optimus configuration with the
NVIDIA card being the secondary GPU?
afzal mohammed Aug. 12, 2015, 9:59 a.m. UTC | #5
Hi,

On Wed, Aug 12, 2015 at 04:40:57PM +0900, Alexandre Courbot wrote:

> Great, thanks. Are you also on an optimus configuration with the
> NVIDIA card being the secondary GPU?

Spec says graphic processor is NVIDIA GeForce NV14P-GV2 GT40M, system
is Lenovo E431 laptop.

I am a stranger here, started Kernel journey towards north and reached
south since the system wasn't booting :), don't know how to find it is
an optimus configuration, if above details aren't enough, let me know
how to find out.

Regards
Afzal
Alexandre Courbot Aug. 14, 2015, 3:49 a.m. UTC | #6
On Wed, Aug 12, 2015 at 6:59 PM, Afzal Mohammed <afzal.mohd.ma@gmail.com> wrote:
> Hi,
>
> On Wed, Aug 12, 2015 at 04:40:57PM +0900, Alexandre Courbot wrote:
>
>> Great, thanks. Are you also on an optimus configuration with the
>> NVIDIA card being the secondary GPU?
>
> Spec says graphic processor is NVIDIA GeForce NV14P-GV2 GT40M, system
> is Lenovo E431 laptop.
>
> I am a stranger here, started Kernel journey towards north and reached
> south since the system wasn't booting :), don't know how to find it is
> an optimus configuration, if above details aren't enough, let me know
> how to find out.

Thanks for the details!

An optimus configuration means that display and basic acceleration is
provided by an integrated Intel graphics, and the NVIDIA GPU can be
switched on/off dynamically to provide more power when needed.

According to your laptop reference, this seems to be the kind of
configuration you have. It is relevant because this issue seems to
happen when the NVIDIA GPU is switched off during boot.
Alexandre Courbot Aug. 17, 2015, 3:05 a.m. UTC | #7
Patch has landed in -rc7, thanks David!

On Fri, Aug 14, 2015 at 12:49 PM, Alexandre Courbot <gnurou@gmail.com> wrote:
> On Wed, Aug 12, 2015 at 6:59 PM, Afzal Mohammed <afzal.mohd.ma@gmail.com> wrote:
>> Hi,
>>
>> On Wed, Aug 12, 2015 at 04:40:57PM +0900, Alexandre Courbot wrote:
>>
>>> Great, thanks. Are you also on an optimus configuration with the
>>> NVIDIA card being the secondary GPU?
>>
>> Spec says graphic processor is NVIDIA GeForce NV14P-GV2 GT40M, system
>> is Lenovo E431 laptop.
>>
>> I am a stranger here, started Kernel journey towards north and reached
>> south since the system wasn't booting :), don't know how to find it is
>> an optimus configuration, if above details aren't enough, let me know
>> how to find out.
>
> Thanks for the details!
>
> An optimus configuration means that display and basic acceleration is
> provided by an integrated Intel graphics, and the NVIDIA GPU can be
> switched on/off dynamically to provide more power when needed.
>
> According to your laptop reference, this seems to be the kind of
> configuration you have. It is relevant because this issue seems to
> happen when the NVIDIA GPU is switched off during boot.
diff mbox

Patch

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gk104.c b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gk104.c
index 52c22b026005..e10f9644140f 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gk104.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gk104.c
@@ -166,30 +166,14 @@  gk104_fifo_context_attach(struct nvkm_object *parent,
 }
 
 static int
-gk104_fifo_chan_kick(struct gk104_fifo_chan *chan)
-{
-	struct nvkm_object *obj = (void *)chan;
-	struct gk104_fifo_priv *priv = (void *)obj->engine;
-
-	nv_wr32(priv, 0x002634, chan->base.chid);
-	if (!nv_wait(priv, 0x002634, 0x100000, 0x000000)) {
-		nv_error(priv, "channel %d [%s] kick timeout\n",
-			 chan->base.chid, nvkm_client_name(chan));
-		return -EBUSY;
-	}
-
-	return 0;
-}
-
-static int
 gk104_fifo_context_detach(struct nvkm_object *parent, bool suspend,
 			  struct nvkm_object *object)
 {
 	struct nvkm_bar *bar = nvkm_bar(parent);
+	struct gk104_fifo_priv *priv = (void *)parent->engine;
 	struct gk104_fifo_base *base = (void *)parent->parent;
 	struct gk104_fifo_chan *chan = (void *)parent;
 	u32 addr;
-	int ret;
 
 	switch (nv_engidx(object->engine)) {
 	case NVDEV_ENGINE_SW    : return 0;
@@ -204,9 +188,13 @@  gk104_fifo_context_detach(struct nvkm_object *parent, bool suspend,
 		return -EINVAL;
 	}
 
-	ret = gk104_fifo_chan_kick(chan);
-	if (ret && suspend)
-		return ret;
+	nv_wr32(priv, 0x002634, chan->base.chid);
+	if (!nv_wait(priv, 0x002634, 0xffffffff, chan->base.chid)) {
+		nv_error(priv, "channel %d [%s] kick timeout\n",
+			 chan->base.chid, nvkm_client_name(chan));
+		if (suspend)
+			return -EBUSY;
+	}
 
 	if (addr) {
 		nv_wo32(base, addr + 0x00, 0x00000000);
@@ -331,7 +319,6 @@  gk104_fifo_chan_fini(struct nvkm_object *object, bool suspend)
 		gk104_fifo_runlist_update(priv, chan->engine);
 	}
 
-	gk104_fifo_chan_kick(chan);
 	nv_wr32(priv, 0x800000 + (chid * 8), 0x00000000);
 	return nvkm_fifo_channel_fini(&chan->base, suspend);
 }