diff mbox series

drm/aperture: Run fbdev removal before internal helpers

Message ID 20220617121027.30273-1-tzimmermann@suse.de (mailing list archive)
State New, archived
Headers show
Series drm/aperture: Run fbdev removal before internal helpers | expand

Commit Message

Thomas Zimmermann June 17, 2022, 12:10 p.m. UTC
Always run fbdev removal first to remove simpledrm via
sysfb_disable(). This clears the internal state. The later call
to drm_aperture_detach_drivers() then does nothing. Otherwise,
with drm_aperture_detach_drivers() running first, the call to
sysfb_disable() uses inconsistent state.

Example backtrace show below:

[   11.663422] ==================================================================
[   11.663426] BUG: KASAN: use-after-free in device_del+0x79/0x5f0
[   11.663435] Read of size 8 at addr ffff888108185050 by task systemd-udevd/311
[   11.663440] CPU: 0 PID: 311 Comm: systemd-udevd Tainted: G            E     5
	.19.0-rc2-1-default+ #1689
[   11.663445] Hardware name: HP ProLiant DL120 G7, BIOS J01 04/21/2011
[   11.663447] Call Trace:
[   11.663449]  <TASK>
[   11.663451]  ? device_del+0x79/0x5f0
[   11.663456]  dump_stack_lvl+0x5b/0x73
[   11.663462]  print_address_description.constprop.0+0x1f/0x1b0
[   11.663468]  ? device_del+0x79/0x5f0
[   11.663471]  ? device_del+0x79/0x5f0
[   11.663475]  print_report.cold+0x3c/0x21c
[   11.663481]  ? lock_acquired+0x87/0x1e0
[   11.663484]  ? lock_acquired+0x87/0x1e0
[   11.663489]  ? device_del+0x79/0x5f0
[   11.663492]  kasan_report+0xbf/0xf0
[   11.663498]  ? device_del+0x79/0x5f0
[   11.663503]  device_del+0x79/0x5f0
[   11.663509]  ? device_remove_attrs+0x170/0x170
[   11.663514]  ? lock_is_held_type+0xe8/0x140
[   11.663523]  platform_device_del.part.0+0x19/0xe0
[   11.663530]  platform_device_unregister+0x1c/0x30
[   11.663535]  sysfb_disable+0x2d/0x70
[   11.663540]  remove_conflicting_framebuffers+0x1c/0xf0
[   11.663546]  remove_conflicting_pci_framebuffers+0x130/0x1a0
[   11.663554]  drm_aperture_remove_conflicting_pci_framebuffers+0x86/0xb0
[   11.663561]  ? mgag200_pci_remove+0x30/0x30 [mgag200]
[   11.663578]  mgag200_pci_probe+0x2d/0x140 [mgag200]

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 873eb3b11860 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Changcheng Deng <deng.changcheng@zte.com.cn>
---
 drivers/gpu/drm/drm_aperture.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

Comments

Javier Martinez Canillas June 17, 2022, 12:29 p.m. UTC | #1
[adding Zack and Alex to Cc list]

Hello Thomas,

Thanks a lot for tracking this down and figuring out the root cause!

On 6/17/22 14:10, Thomas Zimmermann wrote:
> Always run fbdev removal first to remove simpledrm via
> sysfb_disable(). This clears the internal state. The later call
> to drm_aperture_detach_drivers() then does nothing. Otherwise,
> with drm_aperture_detach_drivers() running first, the call to
> sysfb_disable() uses inconsistent state.
> 
> Example backtrace show below:
> 
> [   11.663422] ==================================================================
> [   11.663426] BUG: KASAN: use-after-free in device_del+0x79/0x5f0
> [   11.663435] Read of size 8 at addr ffff888108185050 by task systemd-udevd/311
> [   11.663440] CPU: 0 PID: 311 Comm: systemd-udevd Tainted: G            E     5
> 	.19.0-rc2-1-default+ #1689
> [   11.663445] Hardware name: HP ProLiant DL120 G7, BIOS J01 04/21/2011
> [   11.663447] Call Trace:
> [   11.663449]  <TASK>
> [   11.663451]  ? device_del+0x79/0x5f0
> [   11.663456]  dump_stack_lvl+0x5b/0x73
> [   11.663462]  print_address_description.constprop.0+0x1f/0x1b0
> [   11.663468]  ? device_del+0x79/0x5f0
> [   11.663471]  ? device_del+0x79/0x5f0
> [   11.663475]  print_report.cold+0x3c/0x21c
> [   11.663481]  ? lock_acquired+0x87/0x1e0
> [   11.663484]  ? lock_acquired+0x87/0x1e0
> [   11.663489]  ? device_del+0x79/0x5f0
> [   11.663492]  kasan_report+0xbf/0xf0
> [   11.663498]  ? device_del+0x79/0x5f0
> [   11.663503]  device_del+0x79/0x5f0
> [   11.663509]  ? device_remove_attrs+0x170/0x170
> [   11.663514]  ? lock_is_held_type+0xe8/0x140
> [   11.663523]  platform_device_del.part.0+0x19/0xe0
> [   11.663530]  platform_device_unregister+0x1c/0x30
> [   11.663535]  sysfb_disable+0x2d/0x70
> [   11.663540]  remove_conflicting_framebuffers+0x1c/0xf0
> [   11.663546]  remove_conflicting_pci_framebuffers+0x130/0x1a0
> [   11.663554]  drm_aperture_remove_conflicting_pci_framebuffers+0x86/0xb0
> [   11.663561]  ? mgag200_pci_remove+0x30/0x30 [mgag200]
> [   11.663578]  mgag200_pci_probe+0x2d/0x140 [mgag200]
> 

Maybe include a Reported-by: Zack Rusin <zackr@vmware.com> ? since
this seems to be the exact same issue that he reported yesterday.

Patch looks good to me and this seems to be the correct fix IMO.
That way we won't re-introduce the issue that was fixed by the
sysfb_unregister() function, that is to remove a pdev even if wasn't
bound to a driver to prevent a late simpledrm registration to match.

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

I wonder what's the best way to coordinate with Alex to merge this
fix and your patch that moves the aperture code out of DRM, since
as far as I can tell both should target the v5.20 release.

> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Fixes: 873eb3b11860 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
> Cc: Javier Martinez Canillas <javierm@redhat.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Sam Ravnborg <sam@ravnborg.org>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Zhen Lei <thunder.leizhen@huawei.com>
> Cc: Changcheng Deng <deng.changcheng@zte.com.cn>
> ---
>  drivers/gpu/drm/drm_aperture.c | 26 +++++++++++++++-----------
>  1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_aperture.c b/drivers/gpu/drm/drm_aperture.c
> index 74bd4a76b253..059fd71424f6 100644
> --- a/drivers/gpu/drm/drm_aperture.c
> +++ b/drivers/gpu/drm/drm_aperture.c
> @@ -329,7 +329,20 @@ int drm_aperture_remove_conflicting_pci_framebuffers(struct pci_dev *pdev,
>  						     const struct drm_driver *req_driver)
>  {
>  	resource_size_t base, size;
> -	int bar, ret = 0;
> +	int bar, ret;
> +
> +	/*
> +	 * WARNING: Apparently we must kick fbdev drivers before vgacon,
> +	 * otherwise the vga fbdev driver falls over.
> +	 */
> +#if IS_REACHABLE(CONFIG_FB)
> +	ret = remove_conflicting_pci_framebuffers(pdev, req_driver->name);
> +	if (ret)
> +		return ret;
> +#endif
> +	ret = vga_remove_vgacon(pdev);
> +	if (ret)
> +		return ret;
>  
>  	for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
>  		if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> @@ -339,15 +352,6 @@ int drm_aperture_remove_conflicting_pci_framebuffers(struct pci_dev *pdev,
>  		drm_aperture_detach_drivers(base, size);
>  	}
>  
> -	/*
> -	 * WARNING: Apparently we must kick fbdev drivers before vgacon,
> -	 * otherwise the vga fbdev driver falls over.
> -	 */
> -#if IS_REACHABLE(CONFIG_FB)
> -	ret = remove_conflicting_pci_framebuffers(pdev, req_driver->name);
> -#endif
> -	if (ret == 0)
> -		ret = vga_remove_vgacon(pdev);
> -	return ret;
> +	return 0;
>  }
>  EXPORT_SYMBOL(drm_aperture_remove_conflicting_pci_framebuffers);
Thomas Zimmermann June 17, 2022, 12:41 p.m. UTC | #2
Hi

Am 17.06.22 um 14:29 schrieb Javier Martinez Canillas:
> [adding Zack and Alex to Cc list]
> 
> Hello Thomas,
> 
> Thanks a lot for tracking this down and figuring out the root cause!
> 
> On 6/17/22 14:10, Thomas Zimmermann wrote:
>> Always run fbdev removal first to remove simpledrm via
>> sysfb_disable(). This clears the internal state. The later call
>> to drm_aperture_detach_drivers() then does nothing. Otherwise,
>> with drm_aperture_detach_drivers() running first, the call to
>> sysfb_disable() uses inconsistent state.
>>
>> Example backtrace show below:
>>
>> [   11.663422] ==================================================================
>> [   11.663426] BUG: KASAN: use-after-free in device_del+0x79/0x5f0
>> [   11.663435] Read of size 8 at addr ffff888108185050 by task systemd-udevd/311
>> [   11.663440] CPU: 0 PID: 311 Comm: systemd-udevd Tainted: G            E     5
>> 	.19.0-rc2-1-default+ #1689
>> [   11.663445] Hardware name: HP ProLiant DL120 G7, BIOS J01 04/21/2011
>> [   11.663447] Call Trace:
>> [   11.663449]  <TASK>
>> [   11.663451]  ? device_del+0x79/0x5f0
>> [   11.663456]  dump_stack_lvl+0x5b/0x73
>> [   11.663462]  print_address_description.constprop.0+0x1f/0x1b0
>> [   11.663468]  ? device_del+0x79/0x5f0
>> [   11.663471]  ? device_del+0x79/0x5f0
>> [   11.663475]  print_report.cold+0x3c/0x21c
>> [   11.663481]  ? lock_acquired+0x87/0x1e0
>> [   11.663484]  ? lock_acquired+0x87/0x1e0
>> [   11.663489]  ? device_del+0x79/0x5f0
>> [   11.663492]  kasan_report+0xbf/0xf0
>> [   11.663498]  ? device_del+0x79/0x5f0
>> [   11.663503]  device_del+0x79/0x5f0
>> [   11.663509]  ? device_remove_attrs+0x170/0x170
>> [   11.663514]  ? lock_is_held_type+0xe8/0x140
>> [   11.663523]  platform_device_del.part.0+0x19/0xe0
>> [   11.663530]  platform_device_unregister+0x1c/0x30
>> [   11.663535]  sysfb_disable+0x2d/0x70
>> [   11.663540]  remove_conflicting_framebuffers+0x1c/0xf0
>> [   11.663546]  remove_conflicting_pci_framebuffers+0x130/0x1a0
>> [   11.663554]  drm_aperture_remove_conflicting_pci_framebuffers+0x86/0xb0
>> [   11.663561]  ? mgag200_pci_remove+0x30/0x30 [mgag200]
>> [   11.663578]  mgag200_pci_probe+0x2d/0x140 [mgag200]
>>
> 
> Maybe include a Reported-by: Zack Rusin <zackr@vmware.com> ? since
> this seems to be the exact same issue that he reported yesterday.

I'll do.

> 
> Patch looks good to me and this seems to be the correct fix IMO.
> That way we won't re-introduce the issue that was fixed by the
> sysfb_unregister() function, that is to remove a pdev even if wasn't
> bound to a driver to prevent a late simpledrm registration to match.
> 
> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

Thanks!

> 
> I wonder what's the best way to coordinate with Alex to merge this
> fix and your patch that moves the aperture code out of DRM, since
> as far as I can tell both should target the v5.20 release.

If nothing else comes in, I'll merge this patch on Monday and send Alex 
an updated version of the vfio patch.

Best regards
Thomas

> 
>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>> Fixes: 873eb3b11860 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
>> Cc: Javier Martinez Canillas <javierm@redhat.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Daniel Vetter <daniel@ffwll.ch>
>> Cc: Sam Ravnborg <sam@ravnborg.org>
>> Cc: Helge Deller <deller@gmx.de>
>> Cc: Thomas Zimmermann <tzimmermann@suse.de>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Zhen Lei <thunder.leizhen@huawei.com>
>> Cc: Changcheng Deng <deng.changcheng@zte.com.cn>
>> ---
>>   drivers/gpu/drm/drm_aperture.c | 26 +++++++++++++++-----------
>>   1 file changed, 15 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_aperture.c b/drivers/gpu/drm/drm_aperture.c
>> index 74bd4a76b253..059fd71424f6 100644
>> --- a/drivers/gpu/drm/drm_aperture.c
>> +++ b/drivers/gpu/drm/drm_aperture.c
>> @@ -329,7 +329,20 @@ int drm_aperture_remove_conflicting_pci_framebuffers(struct pci_dev *pdev,
>>   						     const struct drm_driver *req_driver)
>>   {
>>   	resource_size_t base, size;
>> -	int bar, ret = 0;
>> +	int bar, ret;
>> +
>> +	/*
>> +	 * WARNING: Apparently we must kick fbdev drivers before vgacon,
>> +	 * otherwise the vga fbdev driver falls over.
>> +	 */
>> +#if IS_REACHABLE(CONFIG_FB)
>> +	ret = remove_conflicting_pci_framebuffers(pdev, req_driver->name);
>> +	if (ret)
>> +		return ret;
>> +#endif
>> +	ret = vga_remove_vgacon(pdev);
>> +	if (ret)
>> +		return ret;
>>   
>>   	for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
>>   		if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
>> @@ -339,15 +352,6 @@ int drm_aperture_remove_conflicting_pci_framebuffers(struct pci_dev *pdev,
>>   		drm_aperture_detach_drivers(base, size);
>>   	}
>>   
>> -	/*
>> -	 * WARNING: Apparently we must kick fbdev drivers before vgacon,
>> -	 * otherwise the vga fbdev driver falls over.
>> -	 */
>> -#if IS_REACHABLE(CONFIG_FB)
>> -	ret = remove_conflicting_pci_framebuffers(pdev, req_driver->name);
>> -#endif
>> -	if (ret == 0)
>> -		ret = vga_remove_vgacon(pdev);
>> -	return ret;
>> +	return 0;
>>   }
>>   EXPORT_SYMBOL(drm_aperture_remove_conflicting_pci_framebuffers);
>
Zack Rusin June 17, 2022, 12:49 p.m. UTC | #3
On Fri, 2022-06-17 at 14:41 +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 17.06.22 um 14:29 schrieb Javier Martinez Canillas:
> > [adding Zack and Alex to Cc list]
> > 
> > Hello Thomas,
> > 
> > Thanks a lot for tracking this down and figuring out the root cause!
> > 
> > On 6/17/22 14:10, Thomas Zimmermann wrote:
> > > Always run fbdev removal first to remove simpledrm via
> > > sysfb_disable(). This clears the internal state. The later call
> > > to drm_aperture_detach_drivers() then does nothing. Otherwise,
> > > with drm_aperture_detach_drivers() running first, the call to
> > > sysfb_disable() uses inconsistent state.
> > > 
> > > Example backtrace show below:
> > > 
> > > [   11.663422] ==================================================================
> > > [   11.663426] BUG: KASAN: use-after-free in device_del+0x79/0x5f0
> > > [   11.663435] Read of size 8 at addr ffff888108185050 by task systemd-udevd/311
> > > [   11.663440] CPU: 0 PID: 311 Comm: systemd-udevd Tainted: G            E     5
> > > 	.19.0-rc2-1-default+ #1689
> > > [   11.663445] Hardware name: HP ProLiant DL120 G7, BIOS J01 04/21/2011
> > > [   11.663447] Call Trace:
> > > [   11.663449]  <TASK>
> > > [   11.663451]  ? device_del+0x79/0x5f0
> > > [   11.663456]  dump_stack_lvl+0x5b/0x73
> > > [   11.663462]  print_address_description.constprop.0+0x1f/0x1b0
> > > [   11.663468]  ? device_del+0x79/0x5f0
> > > [   11.663471]  ? device_del+0x79/0x5f0
> > > [   11.663475]  print_report.cold+0x3c/0x21c
> > > [   11.663481]  ? lock_acquired+0x87/0x1e0
> > > [   11.663484]  ? lock_acquired+0x87/0x1e0
> > > [   11.663489]  ? device_del+0x79/0x5f0
> > > [   11.663492]  kasan_report+0xbf/0xf0
> > > [   11.663498]  ? device_del+0x79/0x5f0
> > > [   11.663503]  device_del+0x79/0x5f0
> > > [   11.663509]  ? device_remove_attrs+0x170/0x170
> > > [   11.663514]  ? lock_is_held_type+0xe8/0x140
> > > [   11.663523]  platform_device_del.part.0+0x19/0xe0
> > > [   11.663530]  platform_device_unregister+0x1c/0x30
> > > [   11.663535]  sysfb_disable+0x2d/0x70
> > > [   11.663540]  remove_conflicting_framebuffers+0x1c/0xf0
> > > [   11.663546]  remove_conflicting_pci_framebuffers+0x130/0x1a0
> > > [   11.663554]  drm_aperture_remove_conflicting_pci_framebuffers+0x86/0xb0
> > > [   11.663561]  ? mgag200_pci_remove+0x30/0x30 [mgag200]
> > > [   11.663578]  mgag200_pci_probe+0x2d/0x140 [mgag200]
> > > 
> > 
> > Maybe include a Reported-by: Zack Rusin <zackr@vmware.com> ? since
> > this seems to be the exact same issue that he reported yesterday.
> 
> I'll do.
> 
> > 
> > Patch looks good to me and this seems to be the correct fix IMO.
> > That way we won't re-introduce the issue that was fixed by the
> > sysfb_unregister() function, that is to remove a pdev even if wasn't
> > bound to a driver to prevent a late simpledrm registration to match.
> > 
> > Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
> 
> Thanks!
> 
> > 
> > I wonder what's the best way to coordinate with Alex to merge this
> > fix and your patch that moves the aperture code out of DRM, since
> > as far as I can tell both should target the v5.20 release.
> 
> If nothing else comes in, I'll merge this patch on Monday and send Alex 
> an updated version of the vfio patch.

Thanks Thomas. I just tested it on our arm64 configs and it fixes the crash on boot
we've been seeing and gets them running again.

Reviewed-by: Zack Rusin <zackr@vmware.com>

z
Alex Williamson June 17, 2022, 2:12 p.m. UTC | #4
On Fri, 17 Jun 2022 14:41:01 +0200
Thomas Zimmermann <tzimmermann@suse.de> wrote:

> Hi
> 
> Am 17.06.22 um 14:29 schrieb Javier Martinez Canillas:
> > [adding Zack and Alex to Cc list]
> > 
> > Hello Thomas,
> > 
> > Thanks a lot for tracking this down and figuring out the root cause!
> > 
> > On 6/17/22 14:10, Thomas Zimmermann wrote:  
> >> Always run fbdev removal first to remove simpledrm via
> >> sysfb_disable(). This clears the internal state. The later call
> >> to drm_aperture_detach_drivers() then does nothing. Otherwise,
> >> with drm_aperture_detach_drivers() running first, the call to
> >> sysfb_disable() uses inconsistent state.
> >>
> >> Example backtrace show below:
> >>
> >> [   11.663422] ==================================================================
> >> [   11.663426] BUG: KASAN: use-after-free in device_del+0x79/0x5f0
> >> [   11.663435] Read of size 8 at addr ffff888108185050 by task systemd-udevd/311
> >> [   11.663440] CPU: 0 PID: 311 Comm: systemd-udevd Tainted: G            E     5
> >> 	.19.0-rc2-1-default+ #1689
> >> [   11.663445] Hardware name: HP ProLiant DL120 G7, BIOS J01 04/21/2011
> >> [   11.663447] Call Trace:
> >> [   11.663449]  <TASK>
> >> [   11.663451]  ? device_del+0x79/0x5f0
> >> [   11.663456]  dump_stack_lvl+0x5b/0x73
> >> [   11.663462]  print_address_description.constprop.0+0x1f/0x1b0
> >> [   11.663468]  ? device_del+0x79/0x5f0
> >> [   11.663471]  ? device_del+0x79/0x5f0
> >> [   11.663475]  print_report.cold+0x3c/0x21c
> >> [   11.663481]  ? lock_acquired+0x87/0x1e0
> >> [   11.663484]  ? lock_acquired+0x87/0x1e0
> >> [   11.663489]  ? device_del+0x79/0x5f0
> >> [   11.663492]  kasan_report+0xbf/0xf0
> >> [   11.663498]  ? device_del+0x79/0x5f0
> >> [   11.663503]  device_del+0x79/0x5f0
> >> [   11.663509]  ? device_remove_attrs+0x170/0x170
> >> [   11.663514]  ? lock_is_held_type+0xe8/0x140
> >> [   11.663523]  platform_device_del.part.0+0x19/0xe0
> >> [   11.663530]  platform_device_unregister+0x1c/0x30
> >> [   11.663535]  sysfb_disable+0x2d/0x70
> >> [   11.663540]  remove_conflicting_framebuffers+0x1c/0xf0
> >> [   11.663546]  remove_conflicting_pci_framebuffers+0x130/0x1a0
> >> [   11.663554]  drm_aperture_remove_conflicting_pci_framebuffers+0x86/0xb0
> >> [   11.663561]  ? mgag200_pci_remove+0x30/0x30 [mgag200]
> >> [   11.663578]  mgag200_pci_probe+0x2d/0x140 [mgag200]
> >>  
> > 
> > Maybe include a Reported-by: Zack Rusin <zackr@vmware.com> ? since
> > this seems to be the exact same issue that he reported yesterday.  
> 
> I'll do.
> 
> > 
> > Patch looks good to me and this seems to be the correct fix IMO.
> > That way we won't re-introduce the issue that was fixed by the
> > sysfb_unregister() function, that is to remove a pdev even if wasn't
> > bound to a driver to prevent a late simpledrm registration to match.
> > 
> > Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>  
> 
> Thanks!
> 
> > 
> > I wonder what's the best way to coordinate with Alex to merge this
> > fix and your patch that moves the aperture code out of DRM, since
> > as far as I can tell both should target the v5.20 release.  
> 
> If nothing else comes in, I'll merge this patch on Monday and send Alex 
> an updated version of the vfio patch.

Please also publish a topic branch for the base of that patch if you're
still looking for the non-drm aperture + vfio series to go in through my
vfio tree.  Thanks,

Alex
Thomas Zimmermann June 21, 2022, 11:01 a.m. UTC | #5
Hi

Am 17.06.22 um 16:12 schrieb Alex Williamson:
> On Fri, 17 Jun 2022 14:41:01 +0200
> Thomas Zimmermann <tzimmermann@suse.de> wrote:
> 
>> Hi
>>
>> Am 17.06.22 um 14:29 schrieb Javier Martinez Canillas:
>>> [adding Zack and Alex to Cc list]
>>>
>>> Hello Thomas,
>>>
>>> Thanks a lot for tracking this down and figuring out the root cause!
>>>
>>> On 6/17/22 14:10, Thomas Zimmermann wrote:
>>>> Always run fbdev removal first to remove simpledrm via
>>>> sysfb_disable(). This clears the internal state. The later call
>>>> to drm_aperture_detach_drivers() then does nothing. Otherwise,
>>>> with drm_aperture_detach_drivers() running first, the call to
>>>> sysfb_disable() uses inconsistent state.
>>>>
>>>> Example backtrace show below:
>>>>
>>>> [   11.663422] ==================================================================
>>>> [   11.663426] BUG: KASAN: use-after-free in device_del+0x79/0x5f0
>>>> [   11.663435] Read of size 8 at addr ffff888108185050 by task systemd-udevd/311
>>>> [   11.663440] CPU: 0 PID: 311 Comm: systemd-udevd Tainted: G            E     5
>>>> 	.19.0-rc2-1-default+ #1689
>>>> [   11.663445] Hardware name: HP ProLiant DL120 G7, BIOS J01 04/21/2011
>>>> [   11.663447] Call Trace:
>>>> [   11.663449]  <TASK>
>>>> [   11.663451]  ? device_del+0x79/0x5f0
>>>> [   11.663456]  dump_stack_lvl+0x5b/0x73
>>>> [   11.663462]  print_address_description.constprop.0+0x1f/0x1b0
>>>> [   11.663468]  ? device_del+0x79/0x5f0
>>>> [   11.663471]  ? device_del+0x79/0x5f0
>>>> [   11.663475]  print_report.cold+0x3c/0x21c
>>>> [   11.663481]  ? lock_acquired+0x87/0x1e0
>>>> [   11.663484]  ? lock_acquired+0x87/0x1e0
>>>> [   11.663489]  ? device_del+0x79/0x5f0
>>>> [   11.663492]  kasan_report+0xbf/0xf0
>>>> [   11.663498]  ? device_del+0x79/0x5f0
>>>> [   11.663503]  device_del+0x79/0x5f0
>>>> [   11.663509]  ? device_remove_attrs+0x170/0x170
>>>> [   11.663514]  ? lock_is_held_type+0xe8/0x140
>>>> [   11.663523]  platform_device_del.part.0+0x19/0xe0
>>>> [   11.663530]  platform_device_unregister+0x1c/0x30
>>>> [   11.663535]  sysfb_disable+0x2d/0x70
>>>> [   11.663540]  remove_conflicting_framebuffers+0x1c/0xf0
>>>> [   11.663546]  remove_conflicting_pci_framebuffers+0x130/0x1a0
>>>> [   11.663554]  drm_aperture_remove_conflicting_pci_framebuffers+0x86/0xb0
>>>> [   11.663561]  ? mgag200_pci_remove+0x30/0x30 [mgag200]
>>>> [   11.663578]  mgag200_pci_probe+0x2d/0x140 [mgag200]
>>>>   
>>>
>>> Maybe include a Reported-by: Zack Rusin <zackr@vmware.com> ? since
>>> this seems to be the exact same issue that he reported yesterday.
>>
>> I'll do.
>>
>>>
>>> Patch looks good to me and this seems to be the correct fix IMO.
>>> That way we won't re-introduce the issue that was fixed by the
>>> sysfb_unregister() function, that is to remove a pdev even if wasn't
>>> bound to a driver to prevent a late simpledrm registration to match.
>>>
>>> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
>>
>> Thanks!
>>
>>>
>>> I wonder what's the best way to coordinate with Alex to merge this
>>> fix and your patch that moves the aperture code out of DRM, since
>>> as far as I can tell both should target the v5.20 release.
>>
>> If nothing else comes in, I'll merge this patch on Monday and send Alex
>> an updated version of the vfio patch.
> 
> Please also publish a topic branch for the base of that patch if you're
> still looking for the non-drm aperture + vfio series to go in through my
> vfio tree.  Thanks,

I have merge the aperture fix, but the vfio thing is getting 
complicated. Can we merge your vfio patches through drm-misc-next? As 
the vfio-side of the change already got an r-b from Javier, it should 
show up in v5.20 then.

Best regards
Thomas

> 
> Alex
>
Alex Williamson June 21, 2022, 5:39 p.m. UTC | #6
On Tue, 21 Jun 2022 13:01:08 +0200
Thomas Zimmermann <tzimmermann@suse.de> wrote:

> Hi
> 
> Am 17.06.22 um 16:12 schrieb Alex Williamson:
> > On Fri, 17 Jun 2022 14:41:01 +0200
> > Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >   
> >> Hi
> >>
> >> Am 17.06.22 um 14:29 schrieb Javier Martinez Canillas:  
> >>> [adding Zack and Alex to Cc list]
> >>>
> >>> Hello Thomas,
> >>>
> >>> Thanks a lot for tracking this down and figuring out the root cause!
> >>>
> >>> On 6/17/22 14:10, Thomas Zimmermann wrote:  
> >>>> Always run fbdev removal first to remove simpledrm via
> >>>> sysfb_disable(). This clears the internal state. The later call
> >>>> to drm_aperture_detach_drivers() then does nothing. Otherwise,
> >>>> with drm_aperture_detach_drivers() running first, the call to
> >>>> sysfb_disable() uses inconsistent state.
> >>>>
> >>>> Example backtrace show below:
> >>>>
> >>>> [   11.663422] ==================================================================
> >>>> [   11.663426] BUG: KASAN: use-after-free in device_del+0x79/0x5f0
> >>>> [   11.663435] Read of size 8 at addr ffff888108185050 by task systemd-udevd/311
> >>>> [   11.663440] CPU: 0 PID: 311 Comm: systemd-udevd Tainted: G            E     5
> >>>> 	.19.0-rc2-1-default+ #1689
> >>>> [   11.663445] Hardware name: HP ProLiant DL120 G7, BIOS J01 04/21/2011
> >>>> [   11.663447] Call Trace:
> >>>> [   11.663449]  <TASK>
> >>>> [   11.663451]  ? device_del+0x79/0x5f0
> >>>> [   11.663456]  dump_stack_lvl+0x5b/0x73
> >>>> [   11.663462]  print_address_description.constprop.0+0x1f/0x1b0
> >>>> [   11.663468]  ? device_del+0x79/0x5f0
> >>>> [   11.663471]  ? device_del+0x79/0x5f0
> >>>> [   11.663475]  print_report.cold+0x3c/0x21c
> >>>> [   11.663481]  ? lock_acquired+0x87/0x1e0
> >>>> [   11.663484]  ? lock_acquired+0x87/0x1e0
> >>>> [   11.663489]  ? device_del+0x79/0x5f0
> >>>> [   11.663492]  kasan_report+0xbf/0xf0
> >>>> [   11.663498]  ? device_del+0x79/0x5f0
> >>>> [   11.663503]  device_del+0x79/0x5f0
> >>>> [   11.663509]  ? device_remove_attrs+0x170/0x170
> >>>> [   11.663514]  ? lock_is_held_type+0xe8/0x140
> >>>> [   11.663523]  platform_device_del.part.0+0x19/0xe0
> >>>> [   11.663530]  platform_device_unregister+0x1c/0x30
> >>>> [   11.663535]  sysfb_disable+0x2d/0x70
> >>>> [   11.663540]  remove_conflicting_framebuffers+0x1c/0xf0
> >>>> [   11.663546]  remove_conflicting_pci_framebuffers+0x130/0x1a0
> >>>> [   11.663554]  drm_aperture_remove_conflicting_pci_framebuffers+0x86/0xb0
> >>>> [   11.663561]  ? mgag200_pci_remove+0x30/0x30 [mgag200]
> >>>> [   11.663578]  mgag200_pci_probe+0x2d/0x140 [mgag200]
> >>>>     
> >>>
> >>> Maybe include a Reported-by: Zack Rusin <zackr@vmware.com> ? since
> >>> this seems to be the exact same issue that he reported yesterday.  
> >>
> >> I'll do.
> >>  
> >>>
> >>> Patch looks good to me and this seems to be the correct fix IMO.
> >>> That way we won't re-introduce the issue that was fixed by the
> >>> sysfb_unregister() function, that is to remove a pdev even if wasn't
> >>> bound to a driver to prevent a late simpledrm registration to match.
> >>>
> >>> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>  
> >>
> >> Thanks!
> >>  
> >>>
> >>> I wonder what's the best way to coordinate with Alex to merge this
> >>> fix and your patch that moves the aperture code out of DRM, since
> >>> as far as I can tell both should target the v5.20 release.  
> >>
> >> If nothing else comes in, I'll merge this patch on Monday and send Alex
> >> an updated version of the vfio patch.  
> > 
> > Please also publish a topic branch for the base of that patch if you're
> > still looking for the non-drm aperture + vfio series to go in through my
> > vfio tree.  Thanks,  
> 
> I have merge the aperture fix, but the vfio thing is getting 
> complicated. Can we merge your vfio patches through drm-misc-next? As 
> the vfio-side of the change already got an r-b from Javier, it should 
> show up in v5.20 then.

Sure, if you'd like to take
165541193265.1955826.8778757616438743090.stgit@omen via the drm tree,
feel free, it's obviously the more trivial change of the series.
Thanks,

Alex
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_aperture.c b/drivers/gpu/drm/drm_aperture.c
index 74bd4a76b253..059fd71424f6 100644
--- a/drivers/gpu/drm/drm_aperture.c
+++ b/drivers/gpu/drm/drm_aperture.c
@@ -329,7 +329,20 @@  int drm_aperture_remove_conflicting_pci_framebuffers(struct pci_dev *pdev,
 						     const struct drm_driver *req_driver)
 {
 	resource_size_t base, size;
-	int bar, ret = 0;
+	int bar, ret;
+
+	/*
+	 * WARNING: Apparently we must kick fbdev drivers before vgacon,
+	 * otherwise the vga fbdev driver falls over.
+	 */
+#if IS_REACHABLE(CONFIG_FB)
+	ret = remove_conflicting_pci_framebuffers(pdev, req_driver->name);
+	if (ret)
+		return ret;
+#endif
+	ret = vga_remove_vgacon(pdev);
+	if (ret)
+		return ret;
 
 	for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
 		if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
@@ -339,15 +352,6 @@  int drm_aperture_remove_conflicting_pci_framebuffers(struct pci_dev *pdev,
 		drm_aperture_detach_drivers(base, size);
 	}
 
-	/*
-	 * WARNING: Apparently we must kick fbdev drivers before vgacon,
-	 * otherwise the vga fbdev driver falls over.
-	 */
-#if IS_REACHABLE(CONFIG_FB)
-	ret = remove_conflicting_pci_framebuffers(pdev, req_driver->name);
-#endif
-	if (ret == 0)
-		ret = vga_remove_vgacon(pdev);
-	return ret;
+	return 0;
 }
 EXPORT_SYMBOL(drm_aperture_remove_conflicting_pci_framebuffers);