mbox series

[0/2] drm/nouveau: Two more fixes

Message ID 20190916143606.9272-1-thierry.reding@gmail.com (mailing list archive)
Headers show
Series drm/nouveau: Two more fixes | expand

Message

Thierry Reding Sept. 16, 2019, 2:36 p.m. UTC
From: Thierry Reding <treding@nvidia.com>

Hi Ben,

I messed up the ordering of patches in my tree a bit, so these two fixes
got separated from the others. I don't consider these particularily
urgent because the crash that the first one fixes only happens on gp10b
which we don't enable by default yet and the second patch fixes a crash
that only happens on module unload (or driver unbind, more accurately),
which isn't a terribly common thing to do.

I'll be sending out fixes shortly to make the GP10B work more properly
on a wider range of Jetson TX2 devices and enable it by default.

One thing to mention is that I'm not exactly sure if the first patch is
the right thing to do. I haven't seen any issues after that change, but
I'm also not exactly sure I understand what BAR2 is used for, so I don't
know if I would've even covered those code paths (other than the one
causing the crash at probe time) in my tests.

It'd be great to get Lyude's feedback on the second patch, since that
call to pci_disable_device() was rather oddly placed and I'm not sure if
that was essential for things to work or whether the slightly different
point in time where it's called after this patch is also okay. It looks
to me like it should work fine, but I don't currently have a way to test
this on desktop GPUs.

Thierry

Thierry Reding (2):
  drm/nouveau: tegra: Fix NULL pointer dereference
  drm/nouveau: tegra: Do not try to disable PCI device

 drivers/gpu/drm/nouveau/nouveau_drm.c         |  3 +-
 .../drm/nouveau/nvkm/subdev/instmem/gk20a.c   | 30 +++++++++++++++++++
 2 files changed, 31 insertions(+), 2 deletions(-)

Comments

Ben Skeggs Sept. 17, 2019, 6:07 a.m. UTC | #1
On Tue, 17 Sep 2019 at 00:36, Thierry Reding <thierry.reding@gmail.com> wrote:
>
> From: Thierry Reding <treding@nvidia.com>
>
> Hi Ben,
>
> I messed up the ordering of patches in my tree a bit, so these two fixes
> got separated from the others. I don't consider these particularily
> urgent because the crash that the first one fixes only happens on gp10b
> which we don't enable by default yet and the second patch fixes a crash
> that only happens on module unload (or driver unbind, more accurately),
> which isn't a terribly common thing to do.
>
> I'll be sending out fixes shortly to make the GP10B work more properly
> on a wider range of Jetson TX2 devices and enable it by default.
>
> One thing to mention is that I'm not exactly sure if the first patch is
> the right thing to do. I haven't seen any issues after that change, but
> I'm also not exactly sure I understand what BAR2 is used for, so I don't
> know if I would've even covered those code paths (other than the one
> causing the crash at probe time) in my tests.
BAR2 on dGPUs is used to map kernel-level GPU objects in VRAM so they
can be accessed by the driver.  It's pretty much a smaller version of
BAR1, but intended for a different purpose.

On dGPUs, there's a couple of places (fault buffer address, and fault
method buffer on volta) where the GPU wants PRI regs to be poked with
an offset within BAR2 rather than an aperture+offset combination.  I'm
not 100% sure what Tegra parts do here, but presumably if it's working
for you, they're happy to just accept a system memory address instead.

I guess this would be the right thing to do here in that situation.
The more obvious (from a "reading the code" POV) thing to do would be
to write Tegra-specific versions of the functions that use
nvkm_memory_bar2() to perform this mapping, and use nvkm_memory_addr()
instead but I'm not sure if we need/want to go to that effort.  It's
conceivable it could be required at some point.

Ben.

>
> It'd be great to get Lyude's feedback on the second patch, since that
> call to pci_disable_device() was rather oddly placed and I'm not sure if
> that was essential for things to work or whether the slightly different
> point in time where it's called after this patch is also okay. It looks
> to me like it should work fine, but I don't currently have a way to test
> this on desktop GPUs.
>
> Thierry
>
> Thierry Reding (2):
>   drm/nouveau: tegra: Fix NULL pointer dereference
>   drm/nouveau: tegra: Do not try to disable PCI device
>
>  drivers/gpu/drm/nouveau/nouveau_drm.c         |  3 +-
>  .../drm/nouveau/nvkm/subdev/instmem/gk20a.c   | 30 +++++++++++++++++++
>  2 files changed, 31 insertions(+), 2 deletions(-)
>
> --
> 2.23.0
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
Thierry Reding Sept. 17, 2019, 9:07 a.m. UTC | #2
On Tue, Sep 17, 2019 at 04:07:54PM +1000, Ben Skeggs wrote:
> On Tue, 17 Sep 2019 at 00:36, Thierry Reding <thierry.reding@gmail.com> wrote:
> >
> > From: Thierry Reding <treding@nvidia.com>
> >
> > Hi Ben,
> >
> > I messed up the ordering of patches in my tree a bit, so these two fixes
> > got separated from the others. I don't consider these particularily
> > urgent because the crash that the first one fixes only happens on gp10b
> > which we don't enable by default yet and the second patch fixes a crash
> > that only happens on module unload (or driver unbind, more accurately),
> > which isn't a terribly common thing to do.
> >
> > I'll be sending out fixes shortly to make the GP10B work more properly
> > on a wider range of Jetson TX2 devices and enable it by default.
> >
> > One thing to mention is that I'm not exactly sure if the first patch is
> > the right thing to do. I haven't seen any issues after that change, but
> > I'm also not exactly sure I understand what BAR2 is used for, so I don't
> > know if I would've even covered those code paths (other than the one
> > causing the crash at probe time) in my tests.
> BAR2 on dGPUs is used to map kernel-level GPU objects in VRAM so they
> can be accessed by the driver.  It's pretty much a smaller version of
> BAR1, but intended for a different purpose.
> 
> On dGPUs, there's a couple of places (fault buffer address, and fault
> method buffer on volta) where the GPU wants PRI regs to be poked with
> an offset within BAR2 rather than an aperture+offset combination.  I'm
> not 100% sure what Tegra parts do here, but presumably if it's working
> for you, they're happy to just accept a system memory address instead.
> 
> I guess this would be the right thing to do here in that situation.
> The more obvious (from a "reading the code" POV) thing to do would be
> to write Tegra-specific versions of the functions that use
> nvkm_memory_bar2() to perform this mapping, and use nvkm_memory_addr()
> instead but I'm not sure if we need/want to go to that effort.  It's
> conceivable it could be required at some point.

Yeah, that sounds slightly more correct. I'll look into it and see if I
can come up with something.

Thierry

> 
> Ben.
> 
> >
> > It'd be great to get Lyude's feedback on the second patch, since that
> > call to pci_disable_device() was rather oddly placed and I'm not sure if
> > that was essential for things to work or whether the slightly different
> > point in time where it's called after this patch is also okay. It looks
> > to me like it should work fine, but I don't currently have a way to test
> > this on desktop GPUs.
> >
> > Thierry
> >
> > Thierry Reding (2):
> >   drm/nouveau: tegra: Fix NULL pointer dereference
> >   drm/nouveau: tegra: Do not try to disable PCI device
> >
> >  drivers/gpu/drm/nouveau/nouveau_drm.c         |  3 +-
> >  .../drm/nouveau/nvkm/subdev/instmem/gk20a.c   | 30 +++++++++++++++++++
> >  2 files changed, 31 insertions(+), 2 deletions(-)
> >
> > --
> > 2.23.0
> >
> > _______________________________________________
> > Nouveau mailing list
> > Nouveau@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/nouveau