diff mbox

drm/nouveau/gem: tolerate a buffer specified multiple times

Message ID 1438252085-4773-1-git-send-email-pure.logic@nexus-software.ie
State New, archived
Headers show

Commit Message

Bryan O'Donoghue July 30, 2015, 10:28 a.m. UTC
Ubuntu is shipping Chrome Version 44.0.2403.125 (64-bit). With this version
of the browser and current tip-of-tree 86ea07ca846a I get the following
error message followed by a lock-up of X.

nouveau E[chrome[2737]] multiple instances of buffer 33 on validation list
nouveau E[chrome[2737]] validate_init
nouveau E[chrome[2737]] validate: -22
nouveau E[chrome[2737]] multiple instances of buffer 18 on validation list
nouveau E[chrome[2737]] validate_init
 nouveau E[chrome[2737]] validate: -22
nouveau E[   PFIFO][0000:01:00.0] PFIFO: read fault at
0x0003e21000 [PAGE_NOT_PRESENT] from (unknown enum
0x00000000)/GPC0/(unknown enum 0x0000000f) on channel 0x007f80c000
[unknown]

This patch suggests a fix for this with the kernel simply tolerating an
application such as chrome requesting the same buffer more than once.

With the version of chrome given above, you can elicit this behaviour by
clicking on the bookmarks drop down. This will open another window on-top
of the current window. Minus the fix included here, this will lead to hard
lockup of all windows on the desktop.

Chrome Version 44.0.2403.125 (64-bit)
Linux 4.2.0-rc4+ 86ea07ca846a

People are suggesting running chrome with -disable-gpu however it is
possible to run Chrome in it's default mode, so long as we tolerate the
above behaviour.

http://tinyurl.com/orvbzf3

Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
---
 drivers/gpu/drm/nouveau/nouveau_gem.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Bryan O'Donoghue July 30, 2015, 2:56 p.m. UTC | #1
On 30/07/15 15:52, Bryan O'Donoghue wrote:
> On 30/07/15 15:49, Peter Hurley wrote:
>> On 07/30/2015 10:12 AM, Ilia Mirkin wrote:
>>> Is this happening with libdrm 2.4.60? If so, that's a known
>>> (user-side) issue and should be fixed by using any version but that
>>> one.
>>
>> What's the freedesktop bugzilla # for reference?
>>
>> Regards,
>> Peter Hurley
>
> I believe it's this one
>
> https://bugs.freedesktop.org/show_bug.cgi?id=89842#c19
>

Not really a world of choice on ubuntu to fix it though...

deckard@aineko:~/Development/projectara$ apt-show-versions libdrm2
libdrm2:amd64/trusty-updates 2.4.60-2~ubuntu14.04.1 uptodate
libdrm2:i386/trusty-updates 2.4.60-2~ubuntu14.04.1 uptodate

:(
Bryan O'Donoghue July 30, 2015, 3:14 p.m. UTC | #2
On 30/07/15 16:02, Ilia Mirkin wrote:
>
> That's unfortunate. I know next to nothing about debian/ubuntu or how
> they do versions or how to even build packages for them. But they're
> big distros, presumably they have support teams of some sort, perhaps
> they can help you.
>
> Assuming that switching away does resolve the issue for you, perhaps
> you can also recommend that they avoid shipping that version, or
> include this nouveau fix in it:
>
> http://cgit.freedesktop.org/mesa/drm/commit/?id=812e8fe6ce46d733c30207ee26c788c61f546294
>
> This whole libdrm thing is a bit of a cluster%@#$ unfortunately --
> 2.4.60 is broken for nouveau, building even the latest released
> xf86-video-intel against 2.4.61+ causes it to not start ("fixed" in
> xf86-video-intel git), and newer mesa requires libdrm 2.4.60+.
>
>    -ilia
>

Matter of fact

apt-cache show libdrm2
sudo apt-get install libdrm2=2.4.56-1~ubuntu2
#sudo echo “package libdrm2” | sudo dpkg –set-selections

I'll give it a go at the end of the working day - should give enough 
time to recover if it all goes spectacularly wrong :)
Bryan O'Donoghue July 31, 2015, 9:53 a.m. UTC | #3
On 31/07/15 01:03, Bryan O'Donoghue wrote:
> On 30/07/15 22:45, Peter Hurley wrote:
>> [ +cc Debian maintainer ]
>>
>> On 07/30/2015 11:26 AM, Emil Velikov wrote:
>>> On 30 July 2015 at 16:02, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>>>> On Thu, Jul 30, 2015 at 10:56 AM, Bryan O'Donoghue
>>>> <pure.logic@nexus-software.ie> wrote:
>>>>> On 30/07/15 15:52, Bryan O'Donoghue wrote:
>>>>>>
>>>>>> On 30/07/15 15:49, Peter Hurley wrote:
>>>>>>>
>>>>>>> On 07/30/2015 10:12 AM, Ilia Mirkin wrote:
>>>>>>>>
>>>>>>>> Is this happening with libdrm 2.4.60? If so, that's a known
>>>>>>>> (user-side) issue and should be fixed by using any version but that
>>>>>>>> one.
>>>>>>>
>>>>>>>
>>>>>>> What's the freedesktop bugzilla # for reference?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Peter Hurley
>>>>>>
>>>>>>
>>>>>> I believe it's this one
>>>>>>
>>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=89842#c19
>>>>>>
>>>>>
>>>>> Not really a world of choice on ubuntu to fix it though...
>>>>>
>>>>> deckard@aineko:~/Development/projectara$ apt-show-versions libdrm2
>>>>> libdrm2:amd64/trusty-updates 2.4.60-2~ubuntu14.04.1 uptodate
>>>>> libdrm2:i386/trusty-updates 2.4.60-2~ubuntu14.04.1 uptodate
>>>>>
>>>>> :(
>>>>
>>>> That's unfortunate. I know next to nothing about debian/ubuntu or how
>>>> they do versions or how to even build packages for them. But they're
>>>> big distros, presumably they have support teams of some sort, perhaps
>>>> they can help you.
>>>>
>>>> Assuming that switching away does resolve the issue for you, perhaps
>>>> you can also recommend that they avoid shipping that version, or
>>>> include this nouveau fix in it:
>>>>
>>>> http://cgit.freedesktop.org/mesa/drm/commit/?id=812e8fe6ce46d733c30207ee26c788c61f546294
>>>>
>>>>
>>> Fwiw debian has been tracking this as #789759, and they are shipping
>>> 2.4.62 which includes the fix.
>>
>> Unfortunately the LTS version of Ubuntu (trusty) was updated to 2.4.60
>> several days ago without this fix.
>>
>> I repackaged libdrm 2.4.60 with only the bug fix above and confirm the
>> patch above fixes the observed behavior in freedesktop bug# 89842/
>> debian bug# 789759.
>>
>> I pushed the repackage to Launchpad PPA @ ppa:phurley/libdrm
>>
>> Hopefully the Debian maintainer grabs this fix and updates the official
>> distribution version soon.
>>
>> Regards,
>> Peter Hurley
>
> Yep.
>
> Dropping down to 2.4.56-1~ubuntu2 definitely removes the
>
> nouveau E[chrome[2737]] multiple instances of buffer 33 on validation list
> nouveau E[chrome[2737]] validate_init
> nouveau E[chrome[2737]] validate: -22
> nouveau E[chrome[2737]] multiple instances of buffer 18 on validation list
> nouveau E[chrome[2737]] validate_init
>   nouveau E[chrome[2737]] validate: -22
> nouveau E[   PFIFO][0000:01:00.0] PFIFO: read fault at
> 0x0003e21000 [PAGE_NOT_PRESENT] from (unknown enum
> 0x00000000)/GPC0/(unknown enum 0x0000000f) on channel 0x007f80c000
> [unknown]
>
> and hard lock-up of X. I'll update these guys with the fix
>
> http://tinyurl.com/orvbzf3

Hmm.

Interesting - I spoke way too soon on that.

Left my machine up overnight and lo and behold now that I do a dmesg I 
see...

nouveau E[chrome[2870]] multiple instances of buffer 16 on validation list
nouveau E[chrome[2870]] multiple instances of buffer 24 on validation list
nouveau E[chrome[2870]] multiple instances of buffer 167 on validation list
nouveau E[chrome[2870]] multiple instances of buffer 251 on validation list
nouveau E[chrome[2870]] multiple instances of buffer 248 on validation list
nouveau E[chrome[2870]] multiple instances of buffer 249 on validation list
nouveau E[chrome[2870]] multiple instances of buffer 230 on validation list
nouveau E[chrome[2870]] multiple instances of buffer 253 on validation list
nouveau E[chrome[2870]] multiple instances of buffer 255 on validation list
nouveau E[chrome[2870]] multiple instances of buffer 230 on validation list
nouveau E[chrome[2870]] multiple instances of buffer 257 on validation list
nouveau E[chrome[2870]] multiple instances of buffer 230 on validation list

deckard@aineko:~$ dpkg -s libdrm2
Package: libdrm2
Status: install ok installed
Priority: optional
Section: libs
Installed-Size: 106
Maintainer: Ubuntu X-SWAT <ubuntu-x@lists.ubuntu.com>
Architecture: amd64
Multi-Arch: same
Source: libdrm
Version: 2.4.56-1~ubuntu2
Depends: libc6 (>= 2.17)
Pre-Depends: multiarch-support
Description: Userspace interface to kernel DRM services -- runtime
  This library implements the userspace interface to the kernel DRM
  services.  DRM stands for "Direct Rendering Manager", which is the
  kernelspace portion of the "Direct Rendering Infrastructure" (DRI).
  The DRI is currently used on Linux to provide hardware-accelerated
  OpenGL drivers.
  .
  This package provides the runtime environment for libdrm.
Orig-Maintainer: Debian X Strike Force <debian-x@lists.debian.org>

deckard@aineko:~$ uname -a
Linux aineko 4.2.0-rc4+ #50 SMP Thu Jul 30 01:22:01 IST 2015 x86_64 
x86_64 x86_64 GNU/Linux

Please note - this machine has the fix I proposed in the original patch 
applied so X is not locking up when the multiple instances message happens.

In any case to answer the original question - I don't believe switching 
away from 2.4.60 will resolve this issue and similarly then Ilia - do 
you happen to know if 2.4.56 should have the bug ? Should I now be using 
a version of libdrm2 which is good ???

Bryan
Bryan O'Donoghue July 31, 2015, 9:58 a.m. UTC | #4
On 31/07/15 10:53, Bryan O'Donoghue wrote:
> On 31/07/15 01:03, Bryan O'Donoghue wrote:
>> On 30/07/15 22:45, Peter Hurley wrote:
>>> [ +cc Debian maintainer ]
>>>
>>> On 07/30/2015 11:26 AM, Emil Velikov wrote:
>>>> On 30 July 2015 at 16:02, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>>>>> On Thu, Jul 30, 2015 at 10:56 AM, Bryan O'Donoghue
>>>>> <pure.logic@nexus-software.ie> wrote:
>>>>>> On 30/07/15 15:52, Bryan O'Donoghue wrote:
>>>>>>>
>>>>>>> On 30/07/15 15:49, Peter Hurley wrote:
>>>>>>>>
>>>>>>>> On 07/30/2015 10:12 AM, Ilia Mirkin wrote:
>>>>>>>>>
>>>>>>>>> Is this happening with libdrm 2.4.60? If so, that's a known
>>>>>>>>> (user-side) issue and should be fixed by using any version but
>>>>>>>>> that
>>>>>>>>> one.
>>>>>>>>
>>>>>>>>
>>>>>>>> What's the freedesktop bugzilla # for reference?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Peter Hurley
>>>>>>>
>>>>>>>
>>>>>>> I believe it's this one
>>>>>>>
>>>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=89842#c19
>>>>>>>
>>>>>>
>>>>>> Not really a world of choice on ubuntu to fix it though...
>>>>>>
>>>>>> deckard@aineko:~/Development/projectara$ apt-show-versions libdrm2
>>>>>> libdrm2:amd64/trusty-updates 2.4.60-2~ubuntu14.04.1 uptodate
>>>>>> libdrm2:i386/trusty-updates 2.4.60-2~ubuntu14.04.1 uptodate
>>>>>>
>>>>>> :(
>>>>>
>>>>> That's unfortunate. I know next to nothing about debian/ubuntu or how
>>>>> they do versions or how to even build packages for them. But they're
>>>>> big distros, presumably they have support teams of some sort, perhaps
>>>>> they can help you.
>>>>>
>>>>> Assuming that switching away does resolve the issue for you, perhaps
>>>>> you can also recommend that they avoid shipping that version, or
>>>>> include this nouveau fix in it:
>>>>>
>>>>> http://cgit.freedesktop.org/mesa/drm/commit/?id=812e8fe6ce46d733c30207ee26c788c61f546294
>>>>>
>>>>>
>>>>>
>>>> Fwiw debian has been tracking this as #789759, and they are shipping
>>>> 2.4.62 which includes the fix.
>>>
>>> Unfortunately the LTS version of Ubuntu (trusty) was updated to 2.4.60
>>> several days ago without this fix.
>>>
>>> I repackaged libdrm 2.4.60 with only the bug fix above and confirm the
>>> patch above fixes the observed behavior in freedesktop bug# 89842/
>>> debian bug# 789759.
>>>
>>> I pushed the repackage to Launchpad PPA @ ppa:phurley/libdrm
>>>
>>> Hopefully the Debian maintainer grabs this fix and updates the official
>>> distribution version soon.
>>>
>>> Regards,
>>> Peter Hurley
>>
>> Yep.
>>
>> Dropping down to 2.4.56-1~ubuntu2 definitely removes the
>>
>> nouveau E[chrome[2737]] multiple instances of buffer 33 on validation
>> list
>> nouveau E[chrome[2737]] validate_init
>> nouveau E[chrome[2737]] validate: -22
>> nouveau E[chrome[2737]] multiple instances of buffer 18 on validation
>> list
>> nouveau E[chrome[2737]] validate_init
>>   nouveau E[chrome[2737]] validate: -22
>> nouveau E[   PFIFO][0000:01:00.0] PFIFO: read fault at
>> 0x0003e21000 [PAGE_NOT_PRESENT] from (unknown enum
>> 0x00000000)/GPC0/(unknown enum 0x0000000f) on channel 0x007f80c000
>> [unknown]
>>
>> and hard lock-up of X. I'll update these guys with the fix
>>
>> http://tinyurl.com/orvbzf3
>
> Hmm.
>
> Interesting - I spoke way too soon on that.
>
> Left my machine up overnight and lo and behold now that I do a dmesg I
> see...
>
> nouveau E[chrome[2870]] multiple instances of buffer 16 on validation list
> nouveau E[chrome[2870]] multiple instances of buffer 24 on validation list
> nouveau E[chrome[2870]] multiple instances of buffer 167 on validation list
> nouveau E[chrome[2870]] multiple instances of buffer 251 on validation list
> nouveau E[chrome[2870]] multiple instances of buffer 248 on validation list
> nouveau E[chrome[2870]] multiple instances of buffer 249 on validation list
> nouveau E[chrome[2870]] multiple instances of buffer 230 on validation list
> nouveau E[chrome[2870]] multiple instances of buffer 253 on validation list
> nouveau E[chrome[2870]] multiple instances of buffer 255 on validation list
> nouveau E[chrome[2870]] multiple instances of buffer 230 on validation list
> nouveau E[chrome[2870]] multiple instances of buffer 257 on validation list
> nouveau E[chrome[2870]] multiple instances of buffer 230 on validation list
>
> deckard@aineko:~$ dpkg -s libdrm2
> Package: libdrm2
> Status: install ok installed
> Priority: optional
> Section: libs
> Installed-Size: 106
> Maintainer: Ubuntu X-SWAT <ubuntu-x@lists.ubuntu.com>
> Architecture: amd64
> Multi-Arch: same
> Source: libdrm
> Version: 2.4.56-1~ubuntu2
> Depends: libc6 (>= 2.17)
> Pre-Depends: multiarch-support
> Description: Userspace interface to kernel DRM services -- runtime
>   This library implements the userspace interface to the kernel DRM
>   services.  DRM stands for "Direct Rendering Manager", which is the
>   kernelspace portion of the "Direct Rendering Infrastructure" (DRI).
>   The DRI is currently used on Linux to provide hardware-accelerated
>   OpenGL drivers.
>   .
>   This package provides the runtime environment for libdrm.
> Orig-Maintainer: Debian X Strike Force <debian-x@lists.debian.org>
>
> deckard@aineko:~$ uname -a
> Linux aineko 4.2.0-rc4+ #50 SMP Thu Jul 30 01:22:01 IST 2015 x86_64
> x86_64 x86_64 GNU/Linux
>
> Please note - this machine has the fix I proposed in the original patch
> applied so X is not locking up when the multiple instances message happens.
>
> In any case to answer the original question - I don't believe switching
> away from 2.4.60 will resolve this issue

2.40.6 I mean :)
Bryan O'Donoghue July 31, 2015, 10:27 a.m. UTC | #5
On 31/07/15 10:58, Bryan O'Donoghue wrote:
> On 31/07/15 10:53, Bryan O'Donoghue wrote:
>> On 31/07/15 01:03, Bryan O'Donoghue wrote:
>>> On 30/07/15 22:45, Peter Hurley wrote:
>>>> [ +cc Debian maintainer ]
>>>>
>>>> On 07/30/2015 11:26 AM, Emil Velikov wrote:
>>>>> On 30 July 2015 at 16:02, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>>>>>> On Thu, Jul 30, 2015 at 10:56 AM, Bryan O'Donoghue
>>>>>> <pure.logic@nexus-software.ie> wrote:
>>>>>>> On 30/07/15 15:52, Bryan O'Donoghue wrote:
>>>>>>>>
>>>>>>>> On 30/07/15 15:49, Peter Hurley wrote:
>>>>>>>>>
>>>>>>>>> On 07/30/2015 10:12 AM, Ilia Mirkin wrote:
>>>>>>>>>>
>>>>>>>>>> Is this happening with libdrm 2.4.60? If so, that's a known
>>>>>>>>>> (user-side) issue and should be fixed by using any version but
>>>>>>>>>> that
>>>>>>>>>> one.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What's the freedesktop bugzilla # for reference?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Peter Hurley
>>>>>>>>
>>>>>>>>
>>>>>>>> I believe it's this one
>>>>>>>>
>>>>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=89842#c19
>>>>>>>>
>>>>>>>
>>>>>>> Not really a world of choice on ubuntu to fix it though...
>>>>>>>
>>>>>>> deckard@aineko:~/Development/projectara$ apt-show-versions libdrm2
>>>>>>> libdrm2:amd64/trusty-updates 2.4.60-2~ubuntu14.04.1 uptodate
>>>>>>> libdrm2:i386/trusty-updates 2.4.60-2~ubuntu14.04.1 uptodate
>>>>>>>
>>>>>>> :(
>>>>>>
>>>>>> That's unfortunate. I know next to nothing about debian/ubuntu or how
>>>>>> they do versions or how to even build packages for them. But they're
>>>>>> big distros, presumably they have support teams of some sort, perhaps
>>>>>> they can help you.
>>>>>>
>>>>>> Assuming that switching away does resolve the issue for you, perhaps
>>>>>> you can also recommend that they avoid shipping that version, or
>>>>>> include this nouveau fix in it:
>>>>>>
>>>>>> http://cgit.freedesktop.org/mesa/drm/commit/?id=812e8fe6ce46d733c30207ee26c788c61f546294
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> Fwiw debian has been tracking this as #789759, and they are shipping
>>>>> 2.4.62 which includes the fix.
>>>>
>>>> Unfortunately the LTS version of Ubuntu (trusty) was updated to 2.4.60
>>>> several days ago without this fix.
>>>>
>>>> I repackaged libdrm 2.4.60 with only the bug fix above and confirm the
>>>> patch above fixes the observed behavior in freedesktop bug# 89842/
>>>> debian bug# 789759.
>>>>
>>>> I pushed the repackage to Launchpad PPA @ ppa:phurley/libdrm
>>>>
>>>> Hopefully the Debian maintainer grabs this fix and updates the official
>>>> distribution version soon.
>>>>
>>>> Regards,
>>>> Peter Hurley
>>>
>>> Yep.
>>>
>>> Dropping down to 2.4.56-1~ubuntu2 definitely removes the
>>>
>>> nouveau E[chrome[2737]] multiple instances of buffer 33 on validation
>>> list
>>> nouveau E[chrome[2737]] validate_init
>>> nouveau E[chrome[2737]] validate: -22
>>> nouveau E[chrome[2737]] multiple instances of buffer 18 on validation
>>> list
>>> nouveau E[chrome[2737]] validate_init
>>>   nouveau E[chrome[2737]] validate: -22
>>> nouveau E[   PFIFO][0000:01:00.0] PFIFO: read fault at
>>> 0x0003e21000 [PAGE_NOT_PRESENT] from (unknown enum
>>> 0x00000000)/GPC0/(unknown enum 0x0000000f) on channel 0x007f80c000
>>> [unknown]
>>>
>>> and hard lock-up of X. I'll update these guys with the fix
>>>
>>> http://tinyurl.com/orvbzf3
>>
>> Hmm.
>>
>> Interesting - I spoke way too soon on that.
>>
>> Left my machine up overnight and lo and behold now that I do a dmesg I
>> see...
>>
>> nouveau E[chrome[2870]] multiple instances of buffer 16 on validation
>> list
>> nouveau E[chrome[2870]] multiple instances of buffer 24 on validation
>> list
>> nouveau E[chrome[2870]] multiple instances of buffer 167 on validation
>> list
>> nouveau E[chrome[2870]] multiple instances of buffer 251 on validation
>> list
>> nouveau E[chrome[2870]] multiple instances of buffer 248 on validation
>> list
>> nouveau E[chrome[2870]] multiple instances of buffer 249 on validation
>> list
>> nouveau E[chrome[2870]] multiple instances of buffer 230 on validation
>> list
>> nouveau E[chrome[2870]] multiple instances of buffer 253 on validation
>> list
>> nouveau E[chrome[2870]] multiple instances of buffer 255 on validation
>> list
>> nouveau E[chrome[2870]] multiple instances of buffer 230 on validation
>> list
>> nouveau E[chrome[2870]] multiple instances of buffer 257 on validation
>> list
>> nouveau E[chrome[2870]] multiple instances of buffer 230 on validation
>> list
>>
>> deckard@aineko:~$ dpkg -s libdrm2
>> Package: libdrm2
>> Status: install ok installed
>> Priority: optional
>> Section: libs
>> Installed-Size: 106
>> Maintainer: Ubuntu X-SWAT <ubuntu-x@lists.ubuntu.com>
>> Architecture: amd64
>> Multi-Arch: same
>> Source: libdrm
>> Version: 2.4.56-1~ubuntu2
>> Depends: libc6 (>= 2.17)
>> Pre-Depends: multiarch-support
>> Description: Userspace interface to kernel DRM services -- runtime
>>   This library implements the userspace interface to the kernel DRM
>>   services.  DRM stands for "Direct Rendering Manager", which is the
>>   kernelspace portion of the "Direct Rendering Infrastructure" (DRI).
>>   The DRI is currently used on Linux to provide hardware-accelerated
>>   OpenGL drivers.
>>   .
>>   This package provides the runtime environment for libdrm.
>> Orig-Maintainer: Debian X Strike Force <debian-x@lists.debian.org>
>>
>> deckard@aineko:~$ uname -a
>> Linux aineko 4.2.0-rc4+ #50 SMP Thu Jul 30 01:22:01 IST 2015 x86_64
>> x86_64 x86_64 GNU/Linux
>>
>> Please note - this machine has the fix I proposed in the original patch
>> applied so X is not locking up when the multiple instances message
>> happens.
>>
>> In any case to answer the original question - I don't believe switching
>> away from 2.4.60 will resolve this issue
>
> 2.40.6 I mean :)
>

ah no... 2.4.60 is right...

Yes so Ilia - I've switched out 2.4.60 as per your suggestion to 2.4.56 
(getting the version numbers right :) ) and it's still definitely giving 
me the multiple instances message.
Bryan O'Donoghue July 31, 2015, 4:19 p.m. UTC | #6
On 30/07/15 22:45, Peter Hurley wrote:
> [ +cc Debian maintainer ]
>
> On 07/30/2015 11:26 AM, Emil Velikov wrote:
>> On 30 July 2015 at 16:02, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>>> On Thu, Jul 30, 2015 at 10:56 AM, Bryan O'Donoghue
>>> <pure.logic@nexus-software.ie> wrote:
>>>> On 30/07/15 15:52, Bryan O'Donoghue wrote:
>>>>>
>>>>> On 30/07/15 15:49, Peter Hurley wrote:
>>>>>>
>>>>>> On 07/30/2015 10:12 AM, Ilia Mirkin wrote:
>>>>>>>
>>>>>>> Is this happening with libdrm 2.4.60? If so, that's a known
>>>>>>> (user-side) issue and should be fixed by using any version but that
>>>>>>> one.
>>>>>>
>>>>>>
>>>>>> What's the freedesktop bugzilla # for reference?
>>>>>>
>>>>>> Regards,
>>>>>> Peter Hurley
>>>>>
>>>>>
>>>>> I believe it's this one
>>>>>
>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=89842#c19
>>>>>
>>>>
>>>> Not really a world of choice on ubuntu to fix it though...
>>>>
>>>> deckard@aineko:~/Development/projectara$ apt-show-versions libdrm2
>>>> libdrm2:amd64/trusty-updates 2.4.60-2~ubuntu14.04.1 uptodate
>>>> libdrm2:i386/trusty-updates 2.4.60-2~ubuntu14.04.1 uptodate
>>>>
>>>> :(
>>>
>>> That's unfortunate. I know next to nothing about debian/ubuntu or how
>>> they do versions or how to even build packages for them. But they're
>>> big distros, presumably they have support teams of some sort, perhaps
>>> they can help you.
>>>
>>> Assuming that switching away does resolve the issue for you, perhaps
>>> you can also recommend that they avoid shipping that version, or
>>> include this nouveau fix in it:
>>>
>>> http://cgit.freedesktop.org/mesa/drm/commit/?id=812e8fe6ce46d733c30207ee26c788c61f546294
>>>
>> Fwiw debian has been tracking this as #789759, and they are shipping
>> 2.4.62 which includes the fix.
>
> Unfortunately the LTS version of Ubuntu (trusty) was updated to 2.4.60
> several days ago without this fix.
>
> I repackaged libdrm 2.4.60 with only the bug fix above and confirm the
> patch above fixes the observed behavior in freedesktop bug# 89842/
> debian bug# 789759.
>
> I pushed the repackage to Launchpad PPA @ ppa:phurley/libdrm
>
> Hopefully the Debian maintainer grabs this fix and updates the official
> distribution version soon.
>
> Regards,
> Peter Hurley
>

Peter.

I tried your suggested fix too as uploaded and it doesn't fix it either.

I think that this is a separate bug or the fix as applied to libdrm 
doesn't actually fix it.


deckard@aineko:~$ dmesg | tail -n 1
nouveau E[chrome[3176]] multiple instances of buffer 413 on validation list

deckard@aineko:~$ dpkg -s libdrm2
Package: libdrm2
Status: install ok installed
Priority: optional
Section: libs
Installed-Size: 106
Maintainer: Debian X Strike Force <debian-x@lists.debian.org>
Architecture: amd64
Multi-Arch: same
Source: libdrm
Version: 2.4.60-2ppa1~trusty1

deckard@aineko:~$ uname -a
Linux aineko 4.2.0-rc4+ #50 SMP Thu Jul 30 01:22:01 IST 2015 x86_64 
x86_64 x86_64 GNU/Linux
Ilia Mirkin July 31, 2015, 4:36 p.m. UTC | #7
On Fri, Jul 31, 2015 at 6:27 AM, Bryan O'Donoghue
<pure.logic@nexus-software.ie> wrote:
> ah no... 2.4.60 is right...
>
> Yes so Ilia - I've switched out 2.4.60 as per your suggestion to 2.4.56
> (getting the version numbers right :) ) and it's still definitely giving me
> the multiple instances message.

This is going to sound like a stupid question, but I'll ask anyways --
you *did* restart chrome after changing libdrm versions, right?

I was going to mention that there were a handful of fixes in libdrm,
potentially since 2.4.56 (I forget the exact versions), but if 2.4.60
also fails, then that would have them.

There was a final assert() added in 2.4.62, but that was to better
isolate the cause of weirdo crashes (i.e. crash when the thing going
wrong happens rather than stashing bad pointers for later very
confusing dereference). Not GPU crashes.

Just for your information,

nouveau E[   PFIFO][0000:01:00.0] PFIFO: read fault at
0x0003e21000 [PAGE_NOT_PRESENT] from (unknown enum
0x00000000)/GPC0/(unknown enum 0x0000000f) on channel 0x007f80c000
[unknown]

means that there was VM fault from an unknown gpu unit (???) when
reading some resource by the GPU. (The GPU has its own MMU.)
Unfortunately this can happen for one of a million reasons, the
biggest one being "unknown", but mesa definitely doesn't handle
command submission failures particularly well... should probably add a
"fail 1% of the time" thing to help fix that up.

Do you have a reproducible way of achieving the multiple buffer on
validation list thing? What GPU do you have? (Looking for a codename,
not a marketing name... lspci should have it... GFxxx or GKxxx or
Gxx.)

  -ilia
Bryan O'Donoghue July 31, 2015, 4:43 p.m. UTC | #8
On 31/07/15 17:36, Ilia Mirkin wrote:
> On Fri, Jul 31, 2015 at 6:27 AM, Bryan O'Donoghue
> <pure.logic@nexus-software.ie> wrote:
>> ah no... 2.4.60 is right...
>>
>> Yes so Ilia - I've switched out 2.4.60 as per your suggestion to 2.4.56
>> (getting the version numbers right :) ) and it's still definitely giving me
>> the multiple instances message.
>
> This is going to sound like a stupid question, but I'll ask anyways --
> you *did* restart chrome after changing libdrm versions, right?

There are no stupid questions - just stupid answers like 'whaddya mean 
restart chrome'

Seriously though, I've restarted the machine each time I've tried to 
switch out those libraries, so it's definitely not that.

> I was going to mention that there were a handful of fixes in libdrm,
> potentially since 2.4.56 (I forget the exact versions), but if 2.4.60
> also fails, then that would have them.
>
> There was a final assert() added in 2.4.62, but that was to better
> isolate the cause of weirdo crashes (i.e. crash when the thing going
> wrong happens rather than stashing bad pointers for later very
> confusing dereference). Not GPU crashes.
>
> Just for your information,
>
> nouveau E[   PFIFO][0000:01:00.0] PFIFO: read fault at
> 0x0003e21000 [PAGE_NOT_PRESENT] from (unknown enum
> 0x00000000)/GPC0/(unknown enum 0x0000000f) on channel 0x007f80c000
> [unknown]
>
> means that there was VM fault from an unknown gpu unit (???) when
> reading some resource by the GPU.

OK, I was assuming it was a side effect of the -EINVAL when we get the 
multiple instances message.

> (The GPU has its own MMU.)
> Unfortunately this can happen for one of a million reasons, the
> biggest one being "unknown", but mesa definitely doesn't handle
> command submission failures particularly well... should probably add a
> "fail 1% of the time" thing to help fix that up.
>
> Do you have a reproducible way of achieving the multiple buffer on
> validation list thing? What GPU do you have? (Looking for a codename,
> not a marketing name... lspci should have it... GFxxx or GKxxx or

01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 
750M Mac Edition] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Apple Inc. Device 0130
	Flags: bus master, fast devsel, latency 0, IRQ 45
	Memory at c0000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 80000000 (64-bit, prefetchable) [size=256M]
	Memory at 90000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 1000 [size=128]
	Expansion ROM at c1000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] #19
	Kernel driver in use: nouveau

Macbook pro retina 2014
Bryan O'Donoghue July 31, 2015, 6:11 p.m. UTC | #9
On 31/07/15 17:43, Bryan O'Donoghue wrote:
> On 31/07/15 17:36, Ilia Mirkin wrote:
>> Do you have a reproducible way of achieving the multiple buffer on
>> validation list thing?

Reliable enough. Start Chrome, then get Chrome to open a menu on top of 
it's own screen - for example click the top right menu bar - the thing 
with the three horizontal bars, scroll down to 'recent tabs' and let the 
mouse hover.

You'll get a menu that opens up over the main chrome screen and at that 
point you'll also get a 'multiple instances of buffer'

Basically drawing one window on top of another inside of the same Chrome 
tab.

I guess the same PID is mapping the same piece of memory twice because 
if I open a seperate Chrome window (which will have a seperate PID) and 
drag one window over the other we don't see a repeat.

If it helps

deckard@aineko:~$ dmesg | tail -n 5
[ 6900.249427] nouveau E[chrome[3176]] multiple instances of buffer 456 
on validation list
[ 6920.992475] nouveau E[chrome[3176]] multiple instances of buffer 458 
on validation list
[ 6934.277352] nouveau E[chrome[3176]] multiple instances of buffer 458 
on validation list
[ 6994.303600] nouveau E[chrome[3176]] multiple instances of buffer 458 
on validation list
[ 7067.436049] nouveau E[chrome[3176]] multiple instances of buffer 456 
on validation list


deckard@aineko:~$ ps -ax | grep chrome | grep 3176
  3176 pts/6    Sl+    0:29 /opt/google/chrome/chrome --type=gpu-process 
--channel=3143.0.1295591 ives-passed-by-fd --v8-snapshot-passed-by-fd 
--supports-dual-gpus=false --gpu-driver-bug-workarounds=2,29,32,45,55,57 
--disable-accelerated-video-decode --gpu-vendor-id=0x10de 
--gpu-device-id=0x0fe9 --gpu-driver-vendor --gpu-driver-version 
--v8-natives-passed-by-fd --v8-snapshot-passed-by-fd

> What GPU do you have? (Looking for a codename,
>> not a marketing name... lspci should have it... GFxxx or GKxxx or
>
> 01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT
> 750M Mac Edition] (rev a1) (prog-if 00 [VGA controller])
>      Subsystem: Apple Inc. Device 0130
>      Flags: bus master, fast devsel, latency 0, IRQ 45
>      Memory at c0000000 (32-bit, non-prefetchable) [size=16M]
>      Memory at 80000000 (64-bit, prefetchable) [size=256M]
>      Memory at 90000000 (64-bit, prefetchable) [size=32M]
>      I/O ports at 1000 [size=128]
>      Expansion ROM at c1000000 [disabled] [size=512K]
>      Capabilities: [60] Power Management version 3
>      Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
>      Capabilities: [78] Express Endpoint, MSI 00
>      Capabilities: [b4] Vendor Specific Information: Len=14 <?>
>      Capabilities: [100] Virtual Channel
>      Capabilities: [128] Power Budgeting <?>
>      Capabilities: [420] Advanced Error Reporting
>      Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1
> Len=024 <?>
>      Capabilities: [900] #19
>      Kernel driver in use: nouveau
>
> Macbook pro retina 2014
Bryan O'Donoghue Aug. 3, 2015, 12:51 a.m. UTC | #10
On 31/07/15 19:11, Bryan O'Donoghue wrote:
> On 31/07/15 17:43, Bryan O'Donoghue wrote:
>> On 31/07/15 17:36, Ilia Mirkin wrote:
>>> Do you have a reproducible way of achieving the multiple buffer on
>>> validation list thing?

Ilia, Peter.

I've filed a bug for you here : 
https://bugs.freedesktop.org/show_bug.cgi?id=91535

I've verified that Peter's PPA library when installed fixes the race 
condition you guys were talking about but running the test program 
tests/nouveau/threaded so, this issue we're discussing here is a 
separate one.

Cheers,
Bryan
diff mbox

Patch

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index af1ee51..a9694faad 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -401,9 +401,7 @@  retry:
 		if (nvbo->reserved_by && nvbo->reserved_by == file_priv) {
 			NV_PRINTK(error, cli, "multiple instances of buffer %d on "
 				      "validation list\n", b->handle);
-			drm_gem_object_unreference_unlocked(gem);
-			ret = -EINVAL;
-			break;
+			continue;
 		}
 
 		ret = ttm_bo_reserve(&nvbo->bo, true, false, true, &op->ticket);