diff mbox series

[7/8] video/aperture: Only remove sysfb on the default vga pci device

Message ID 20230404201842.567344-7-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show
Series [1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers | expand

Commit Message

Daniel Vetter April 4, 2023, 8:18 p.m. UTC
Instead of calling aperture_remove_conflicting_devices() to remove the
conflicting devices, just call to aperture_detach_devices() to detach
the device that matches the same PCI BAR / aperture range. Since the
former is just a wrapper of the latter plus a sysfb_disable() call,
and now that's done in this function but only for the primary devices.

This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable
sysfb device registration when removing conflicting FBs"), where we
remove the sysfb when loading a driver for an unrelated pci device,
resulting in the user loosing their efifb console or similar.

Note that in practice this only is a problem with the nvidia blob,
because that's the only gpu driver people might install which does not
come with an fbdev driver of it's own. For everyone else the real gpu
driver will restore a working console.

Also note that in the referenced bug there's confusion that this same
bug also happens on amdgpu. But that was just another amdgpu specific
regression, which just happened to happen at roughly the same time and
with the same user-observable symptoms. That bug is fixed now, see
https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15

Note that we should not have any such issues on non-pci multi-gpu
issues, because I could only find two such cases:
- SoC with some external panel over spi or similar. These panel
  drivers do not use drm_aperture_remove_conflicting_framebuffers(),
  so no problem.
- vga+mga, which is a direct console driver and entirely bypasses all
  this.

For the above reasons the cc: stable is just notionally, this patch
will need a backport and that's up to nvidia if they care enough.

v2:
- Explain a bit better why other multi-gpu that aren't pci shouldn't
  have any issues with making all this fully pci specific.

v3
- polish commit message (Javier)

Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
Tested-by: Aaron Plattner <aplattner@nvidia.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Aaron Plattner <aplattner@nvidia.com>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Helge Deller <deller@gmx.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: <stable@vger.kernel.org> # v5.19+ (if someone else does the backport)
---
 drivers/video/aperture.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Aaron Plattner April 4, 2023, 8:59 p.m. UTC | #1
On 4/4/23 1:18 PM, Daniel Vetter wrote:
> Instead of calling aperture_remove_conflicting_devices() to remove the
> conflicting devices, just call to aperture_detach_devices() to detach
> the device that matches the same PCI BAR / aperture range. Since the
> former is just a wrapper of the latter plus a sysfb_disable() call,
> and now that's done in this function but only for the primary devices.
> 
> This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable
> sysfb device registration when removing conflicting FBs"), where we
> remove the sysfb when loading a driver for an unrelated pci device,
> resulting in the user loosing their efifb console or similar.
> 
> Note that in practice this only is a problem with the nvidia blob,
> because that's the only gpu driver people might install which does not
> come with an fbdev driver of it's own. For everyone else the real gpu
> driver will restore a working console.

It might be worth noting that this also affects devices that have no 
driver installed, or where the driver failed to initialize or was 
configured not to set a mode. E.g. I reproduced this problem on a laptop 
with i915.modeset=0 and an NVIDIA driver that calls 
drm_fbdev_generic_setup. It would also reproduce on a system that sets 
modeset=0 (or has a GPU that's too new for its corresponding kernel 
driver) and that passes an NVIDIA GPU through to a VM using vfio-pci 
since that also calls aperture_remove_conflicting_pci_devices.

I agree that in practice this will mostly affect people with our driver 
until I get my changes to add drm_fbdev_generic_setup checked in. But 
these other cases don't seem all that unlikely to me.

-- Aaron

> Also note that in the referenced bug there's confusion that this same
> bug also happens on amdgpu. But that was just another amdgpu specific
> regression, which just happened to happen at roughly the same time and
> with the same user-observable symptoms. That bug is fixed now, see
> https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15
> 
> Note that we should not have any such issues on non-pci multi-gpu
> issues, because I could only find two such cases:
> - SoC with some external panel over spi or similar. These panel
>    drivers do not use drm_aperture_remove_conflicting_framebuffers(),
>    so no problem.
> - vga+mga, which is a direct console driver and entirely bypasses all
>    this.
> 
> For the above reasons the cc: stable is just notionally, this patch
> will need a backport and that's up to nvidia if they care enough.
> 
> v2:
> - Explain a bit better why other multi-gpu that aren't pci shouldn't
>    have any issues with making all this fully pci specific.
> 
> v3
> - polish commit message (Javier)
> 
> Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
> Tested-by: Aaron Plattner <aplattner@nvidia.com>
> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
> References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Aaron Plattner <aplattner@nvidia.com>
> Cc: Javier Martinez Canillas <javierm@redhat.com>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Sam Ravnborg <sam@ravnborg.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: <stable@vger.kernel.org> # v5.19+ (if someone else does the backport)
> ---
>   drivers/video/aperture.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
> index 8f1437339e49..2394c2d310f8 100644
> --- a/drivers/video/aperture.c
> +++ b/drivers/video/aperture.c
> @@ -321,15 +321,16 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
>   
>   	primary = pdev == vga_default_device();
>   
> +	if (primary)
> +		sysfb_disable();
> +
>   	for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
>   		if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
>   			continue;
>   
>   		base = pci_resource_start(pdev, bar);
>   		size = pci_resource_len(pdev, bar);
> -		ret = aperture_remove_conflicting_devices(base, size, name);
> -		if (ret)
> -			return ret;
> +		aperture_detach_devices(base, size);
>   	}
>   
>   	if (primary) {
Daniel Vetter April 5, 2023, 8:48 a.m. UTC | #2
On Tue, Apr 04, 2023 at 01:59:33PM -0700, Aaron Plattner wrote:
> On 4/4/23 1:18 PM, Daniel Vetter wrote:
> > Instead of calling aperture_remove_conflicting_devices() to remove the
> > conflicting devices, just call to aperture_detach_devices() to detach
> > the device that matches the same PCI BAR / aperture range. Since the
> > former is just a wrapper of the latter plus a sysfb_disable() call,
> > and now that's done in this function but only for the primary devices.
> > 
> > This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable
> > sysfb device registration when removing conflicting FBs"), where we
> > remove the sysfb when loading a driver for an unrelated pci device,
> > resulting in the user loosing their efifb console or similar.
> > 
> > Note that in practice this only is a problem with the nvidia blob,
> > because that's the only gpu driver people might install which does not
> > come with an fbdev driver of it's own. For everyone else the real gpu
> > driver will restore a working console.
> 
> It might be worth noting that this also affects devices that have no driver
> installed, or where the driver failed to initialize or was configured not to
> set a mode. E.g. I reproduced this problem on a laptop with i915.modeset=0
> and an NVIDIA driver that calls drm_fbdev_generic_setup. It would also
> reproduce on a system that sets modeset=0 (or has a GPU that's too new for
> its corresponding kernel driver) and that passes an NVIDIA GPU through to a
> VM using vfio-pci since that also calls
> aperture_remove_conflicting_pci_devices.
> 
> I agree that in practice this will mostly affect people with our driver
> until I get my changes to add drm_fbdev_generic_setup checked in. But these
> other cases don't seem all that unlikely to me.

Thomas Z. refactored the entire modeset=0 handling to be more consistent
across drivers, so I think in practice it'll again only happen with the
nvidia blob driver (unless you patch in the call to
drm_firmware_drivers_only()). Or if you dont use nomodeset or similar and
instead use a driver-specific module option, which isn't what howtos in
distros recommend.

I can add this to the commit message if you want?
-Daniel

> 
> -- Aaron
> 
> > Also note that in the referenced bug there's confusion that this same
> > bug also happens on amdgpu. But that was just another amdgpu specific
> > regression, which just happened to happen at roughly the same time and
> > with the same user-observable symptoms. That bug is fixed now, see
> > https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15
> > 
> > Note that we should not have any such issues on non-pci multi-gpu
> > issues, because I could only find two such cases:
> > - SoC with some external panel over spi or similar. These panel
> >    drivers do not use drm_aperture_remove_conflicting_framebuffers(),
> >    so no problem.
> > - vga+mga, which is a direct console driver and entirely bypasses all
> >    this.
> > 
> > For the above reasons the cc: stable is just notionally, this patch
> > will need a backport and that's up to nvidia if they care enough.
> > 
> > v2:
> > - Explain a bit better why other multi-gpu that aren't pci shouldn't
> >    have any issues with making all this fully pci specific.
> > 
> > v3
> > - polish commit message (Javier)
> > 
> > Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
> > Tested-by: Aaron Plattner <aplattner@nvidia.com>
> > Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
> > References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Aaron Plattner <aplattner@nvidia.com>
> > Cc: Javier Martinez Canillas <javierm@redhat.com>
> > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > Cc: Helge Deller <deller@gmx.de>
> > Cc: Sam Ravnborg <sam@ravnborg.org>
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: <stable@vger.kernel.org> # v5.19+ (if someone else does the backport)
> > ---
> >   drivers/video/aperture.c | 7 ++++---
> >   1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
> > index 8f1437339e49..2394c2d310f8 100644
> > --- a/drivers/video/aperture.c
> > +++ b/drivers/video/aperture.c
> > @@ -321,15 +321,16 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
> >   	primary = pdev == vga_default_device();
> > +	if (primary)
> > +		sysfb_disable();
> > +
> >   	for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
> >   		if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> >   			continue;
> >   		base = pci_resource_start(pdev, bar);
> >   		size = pci_resource_len(pdev, bar);
> > -		ret = aperture_remove_conflicting_devices(base, size, name);
> > -		if (ret)
> > -			return ret;
> > +		aperture_detach_devices(base, size);
> >   	}
> >   	if (primary) {
Javier Martinez Canillas April 5, 2023, 11:37 a.m. UTC | #3
Daniel Vetter <daniel.vetter@ffwll.ch> writes:

> Instead of calling aperture_remove_conflicting_devices() to remove the
> conflicting devices, just call to aperture_detach_devices() to detach
> the device that matches the same PCI BAR / aperture range. Since the
> former is just a wrapper of the latter plus a sysfb_disable() call,
> and now that's done in this function but only for the primary devices.
>
> This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable
> sysfb device registration when removing conflicting FBs"), where we
> remove the sysfb when loading a driver for an unrelated pci device,
> resulting in the user loosing their efifb console or similar.
>
> Note that in practice this only is a problem with the nvidia blob,
> because that's the only gpu driver people might install which does not
> come with an fbdev driver of it's own. For everyone else the real gpu
> driver will restore a working console.
>
> Also note that in the referenced bug there's confusion that this same
> bug also happens on amdgpu. But that was just another amdgpu specific
> regression, which just happened to happen at roughly the same time and
> with the same user-observable symptoms. That bug is fixed now, see
> https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15
>
> Note that we should not have any such issues on non-pci multi-gpu
> issues, because I could only find two such cases:
> - SoC with some external panel over spi or similar. These panel
>   drivers do not use drm_aperture_remove_conflicting_framebuffers(),
>   so no problem.
> - vga+mga, which is a direct console driver and entirely bypasses all
>   this.
>
> For the above reasons the cc: stable is just notionally, this patch
> will need a backport and that's up to nvidia if they care enough.
>
> v2:
> - Explain a bit better why other multi-gpu that aren't pci shouldn't
>   have any issues with making all this fully pci specific.
>
> v3
> - polish commit message (Javier)
>
> Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
> Tested-by: Aaron Plattner <aplattner@nvidia.com>
> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
> References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Aaron Plattner <aplattner@nvidia.com>
> Cc: Javier Martinez Canillas <javierm@redhat.com>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Sam Ravnborg <sam@ravnborg.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: <stable@vger.kernel.org> # v5.19+ (if someone else does the backport)
> ---
>  drivers/video/aperture.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
diff mbox series

Patch

diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 8f1437339e49..2394c2d310f8 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -321,15 +321,16 @@  int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
 
 	primary = pdev == vga_default_device();
 
+	if (primary)
+		sysfb_disable();
+
 	for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
 		if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
 			continue;
 
 		base = pci_resource_start(pdev, bar);
 		size = pci_resource_len(pdev, bar);
-		ret = aperture_remove_conflicting_devices(base, size, name);
-		if (ret)
-			return ret;
+		aperture_detach_devices(base, size);
 	}
 
 	if (primary) {