mbox series

[RFC,drm-misc-next,v4,0/9] PCI/VGA: Allowing the user to select the primary video adapter at boot time

Message ID 20230904195724.633404-1-sui.jingfeng@linux.dev (mailing list archive)
Headers show
Series PCI/VGA: Allowing the user to select the primary video adapter at boot time | expand

Message

Sui Jingfeng Sept. 4, 2023, 7:57 p.m. UTC
From: Sui Jingfeng <suijingfeng@loongson.cn>

On a machine with multiple GPUs, a Linux user has no control over which
one is primary at boot time. This series tries to solve above mentioned
problem by introduced the ->be_primary() function stub. The specific
device drivers can provide an implementation to hook up with this stub by
calling the vga_client_register() function.

Once the driver bound the device successfully, VGAARB will call back to
the device driver. To query if the device drivers want to be primary or
not. Device drivers can just pass NULL if have no such needs.

Please note that:

1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would
   like to mount at least three video cards.

2) Typically, those non-86 machines don't have a good UEFI firmware
   support, which doesn't support select primary GPU as firmware stage.
   Even on x86, there are old UEFI firmwares which already made undesired
   decision for you.

3) This series is attempt to solve the remain problems at the driver level,
   while another series[1] of me is target to solve the majority of the
   problems at device level.

Tested (limited) on x86 with four video card mounted, Intel UHD Graphics
630 is the default boot VGA, successfully override by ast2400 with
ast.modeset=10 append at the kernel cmd line.

$ lspci | grep VGA

 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XTX [Radeon HD 8490 / R5 235X OEM]
 04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
 05:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 720] (rev a1)

$ sudo dmesg | grep vgaarb

 pci 0000:00:02.0: vgaarb: setting as boot VGA device
 pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
 pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
 pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
 pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
 vgaarb: loaded
 ast 0000:04:00.0: vgaarb: Override as primary by driver
 i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
 radeon 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
 ast 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none

v2:
	* Add a simple implemment for drm/i915 and drm/ast
	* Pick up all tags (Mario)
v3:
	* Fix a mistake for drm/i915 implement
	* Fix patch can not be applied problem because of merge conflect.
v4:
	* Focus on solve the real problem.

v1,v2 at https://patchwork.freedesktop.org/series/120059/
   v3 at https://patchwork.freedesktop.org/series/120562/

[1] https://patchwork.freedesktop.org/series/122845/

Sui Jingfeng (9):
  PCI/VGA: Allowing the user to select the primary video adapter at boot
    time
  drm/nouveau: Implement .be_primary() callback
  drm/radeon: Implement .be_primary() callback
  drm/amdgpu: Implement .be_primary() callback
  drm/i915: Implement .be_primary() callback
  drm/loongson: Implement .be_primary() callback
  drm/ast: Register as a VGA client by calling vga_client_register()
  drm/hibmc: Register as a VGA client by calling vga_client_register()
  drm/gma500: Register as a VGA client by calling vga_client_register()

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 11 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 13 ++++-
 drivers/gpu/drm/ast/ast_drv.c                 | 31 ++++++++++
 drivers/gpu/drm/gma500/psb_drv.c              | 57 ++++++++++++++++++-
 .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c   | 15 +++++
 drivers/gpu/drm/i915/display/intel_vga.c      | 15 ++++-
 drivers/gpu/drm/loongson/loongson_module.c    |  2 +-
 drivers/gpu/drm/loongson/loongson_module.h    |  1 +
 drivers/gpu/drm/loongson/lsdc_drv.c           | 10 +++-
 drivers/gpu/drm/nouveau/nouveau_vga.c         | 11 +++-
 drivers/gpu/drm/radeon/radeon_device.c        | 10 +++-
 drivers/pci/vgaarb.c                          | 43 ++++++++++++--
 drivers/vfio/pci/vfio_pci_core.c              |  2 +-
 include/linux/vgaarb.h                        |  8 ++-
 14 files changed, 210 insertions(+), 19 deletions(-)

Comments

Jani Nikula Sept. 5, 2023, 10:38 a.m. UTC | #1
On Tue, 05 Sep 2023, Sui Jingfeng <sui.jingfeng@linux.dev> wrote:
> From: Sui Jingfeng <suijingfeng@loongson.cn>
>
> On a machine with multiple GPUs, a Linux user has no control over which
> one is primary at boot time. This series tries to solve above mentioned
> problem by introduced the ->be_primary() function stub. The specific
> device drivers can provide an implementation to hook up with this stub by
> calling the vga_client_register() function.
>
> Once the driver bound the device successfully, VGAARB will call back to
> the device driver. To query if the device drivers want to be primary or
> not. Device drivers can just pass NULL if have no such needs.
>
> Please note that:
>
> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would
>    like to mount at least three video cards.
>
> 2) Typically, those non-86 machines don't have a good UEFI firmware
>    support, which doesn't support select primary GPU as firmware stage.
>    Even on x86, there are old UEFI firmwares which already made undesired
>    decision for you.
>
> 3) This series is attempt to solve the remain problems at the driver level,
>    while another series[1] of me is target to solve the majority of the
>    problems at device level.
>
> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics
> 630 is the default boot VGA, successfully override by ast2400 with
> ast.modeset=10 append at the kernel cmd line.

The value 10 is incredibly arbitrary, and multiplied as a magic number
all over the place.

> $ lspci | grep VGA
>
>  00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
>  01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XTX [Radeon HD 8490 / R5 235X OEM]
>  04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
>  05:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 720] (rev a1)

In this example, all of the GPUs are driven by different drivers. What
good does a module parameter do if you have multiple GPUs of the same
model, all driven by the same driver module?

BR,
Jani.

>
> $ sudo dmesg | grep vgaarb
>
>  pci 0000:00:02.0: vgaarb: setting as boot VGA device
>  pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
>  pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>  pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>  pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>  vgaarb: loaded
>  ast 0000:04:00.0: vgaarb: Override as primary by driver
>  i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
>  radeon 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
>  ast 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
>
> v2:
> 	* Add a simple implemment for drm/i915 and drm/ast
> 	* Pick up all tags (Mario)
> v3:
> 	* Fix a mistake for drm/i915 implement
> 	* Fix patch can not be applied problem because of merge conflect.
> v4:
> 	* Focus on solve the real problem.
>
> v1,v2 at https://patchwork.freedesktop.org/series/120059/
>    v3 at https://patchwork.freedesktop.org/series/120562/
>
> [1] https://patchwork.freedesktop.org/series/122845/
>
> Sui Jingfeng (9):
>   PCI/VGA: Allowing the user to select the primary video adapter at boot
>     time
>   drm/nouveau: Implement .be_primary() callback
>   drm/radeon: Implement .be_primary() callback
>   drm/amdgpu: Implement .be_primary() callback
>   drm/i915: Implement .be_primary() callback
>   drm/loongson: Implement .be_primary() callback
>   drm/ast: Register as a VGA client by calling vga_client_register()
>   drm/hibmc: Register as a VGA client by calling vga_client_register()
>   drm/gma500: Register as a VGA client by calling vga_client_register()
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 11 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 13 ++++-
>  drivers/gpu/drm/ast/ast_drv.c                 | 31 ++++++++++
>  drivers/gpu/drm/gma500/psb_drv.c              | 57 ++++++++++++++++++-
>  .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c   | 15 +++++
>  drivers/gpu/drm/i915/display/intel_vga.c      | 15 ++++-
>  drivers/gpu/drm/loongson/loongson_module.c    |  2 +-
>  drivers/gpu/drm/loongson/loongson_module.h    |  1 +
>  drivers/gpu/drm/loongson/lsdc_drv.c           | 10 +++-
>  drivers/gpu/drm/nouveau/nouveau_vga.c         | 11 +++-
>  drivers/gpu/drm/radeon/radeon_device.c        | 10 +++-
>  drivers/pci/vgaarb.c                          | 43 ++++++++++++--
>  drivers/vfio/pci/vfio_pci_core.c              |  2 +-
>  include/linux/vgaarb.h                        |  8 ++-
>  14 files changed, 210 insertions(+), 19 deletions(-)
Thomas Zimmermann Sept. 5, 2023, 10:45 a.m. UTC | #2
Hi

Am 04.09.23 um 21:57 schrieb Sui Jingfeng:
> From: Sui Jingfeng <suijingfeng@loongson.cn>
> 
> On a machine with multiple GPUs, a Linux user has no control over which
> one is primary at boot time. This series tries to solve above mentioned

If anything, the primary graphics adapter is the one initialized by the 
firmware. I think our boot-up graphics also make this assumption implicitly.

But what's the use case for overriding this setting?

Best regards
Thomas

> problem by introduced the ->be_primary() function stub. The specific
> device drivers can provide an implementation to hook up with this stub by
> calling the vga_client_register() function.
> 
> Once the driver bound the device successfully, VGAARB will call back to
> the device driver. To query if the device drivers want to be primary or
> not. Device drivers can just pass NULL if have no such needs.
> 
> Please note that:
> 
> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would
>     like to mount at least three video cards.
> 
> 2) Typically, those non-86 machines don't have a good UEFI firmware
>     support, which doesn't support select primary GPU as firmware stage.
>     Even on x86, there are old UEFI firmwares which already made undesired
>     decision for you.
> 
> 3) This series is attempt to solve the remain problems at the driver level,
>     while another series[1] of me is target to solve the majority of the
>     problems at device level.
> 
> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics
> 630 is the default boot VGA, successfully override by ast2400 with
> ast.modeset=10 append at the kernel cmd line.
> 
> $ lspci | grep VGA
> 
>   00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
>   01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XTX [Radeon HD 8490 / R5 235X OEM]
>   04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
>   05:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 720] (rev a1)
> 
> $ sudo dmesg | grep vgaarb
> 
>   pci 0000:00:02.0: vgaarb: setting as boot VGA device
>   pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
>   pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>   pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>   pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>   vgaarb: loaded
>   ast 0000:04:00.0: vgaarb: Override as primary by driver
>   i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
>   radeon 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
>   ast 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
> 
> v2:
> 	* Add a simple implemment for drm/i915 and drm/ast
> 	* Pick up all tags (Mario)
> v3:
> 	* Fix a mistake for drm/i915 implement
> 	* Fix patch can not be applied problem because of merge conflect.
> v4:
> 	* Focus on solve the real problem.
> 
> v1,v2 at https://patchwork.freedesktop.org/series/120059/
>     v3 at https://patchwork.freedesktop.org/series/120562/
> 
> [1] https://patchwork.freedesktop.org/series/122845/
> 
> Sui Jingfeng (9):
>    PCI/VGA: Allowing the user to select the primary video adapter at boot
>      time
>    drm/nouveau: Implement .be_primary() callback
>    drm/radeon: Implement .be_primary() callback
>    drm/amdgpu: Implement .be_primary() callback
>    drm/i915: Implement .be_primary() callback
>    drm/loongson: Implement .be_primary() callback
>    drm/ast: Register as a VGA client by calling vga_client_register()
>    drm/hibmc: Register as a VGA client by calling vga_client_register()
>    drm/gma500: Register as a VGA client by calling vga_client_register()
> 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 11 +++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 13 ++++-
>   drivers/gpu/drm/ast/ast_drv.c                 | 31 ++++++++++
>   drivers/gpu/drm/gma500/psb_drv.c              | 57 ++++++++++++++++++-
>   .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c   | 15 +++++
>   drivers/gpu/drm/i915/display/intel_vga.c      | 15 ++++-
>   drivers/gpu/drm/loongson/loongson_module.c    |  2 +-
>   drivers/gpu/drm/loongson/loongson_module.h    |  1 +
>   drivers/gpu/drm/loongson/lsdc_drv.c           | 10 +++-
>   drivers/gpu/drm/nouveau/nouveau_vga.c         | 11 +++-
>   drivers/gpu/drm/radeon/radeon_device.c        | 10 +++-
>   drivers/pci/vgaarb.c                          | 43 ++++++++++++--
>   drivers/vfio/pci/vfio_pci_core.c              |  2 +-
>   include/linux/vgaarb.h                        |  8 ++-
>   14 files changed, 210 insertions(+), 19 deletions(-)
>
Thomas Zimmermann Sept. 5, 2023, 10:49 a.m. UTC | #3
Hi

Am 04.09.23 um 21:57 schrieb Sui Jingfeng:
> From: Sui Jingfeng <suijingfeng@loongson.cn>
> 
> On a machine with multiple GPUs, a Linux user has no control over which
> one is primary at boot time. This series tries to solve above mentioned
> problem by introduced the ->be_primary() function stub. The specific
> device drivers can provide an implementation to hook up with this stub by
> calling the vga_client_register() function.
> 
> Once the driver bound the device successfully, VGAARB will call back to
> the device driver. To query if the device drivers want to be primary or
> not. Device drivers can just pass NULL if have no such needs.
> 
> Please note that:
> 
> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would
>     like to mount at least three video cards.
> 
> 2) Typically, those non-86 machines don't have a good UEFI firmware
>     support, which doesn't support select primary GPU as firmware stage.
>     Even on x86, there are old UEFI firmwares which already made undesired
>     decision for you.
> 
> 3) This series is attempt to solve the remain problems at the driver level,
>     while another series[1] of me is target to solve the majority of the
>     problems at device level.
> 
> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics
> 630 is the default boot VGA, successfully override by ast2400 with
> ast.modeset=10 append at the kernel cmd line.

FYI: per-driver modeset parameters are deprecated and not to be used. 
Please don't promote them. You can use modprobe.blacklist or 
initcall_blacklist on the kernel command line.

Best regards
Thomas

> 
> $ lspci | grep VGA
> 
>   00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
>   01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XTX [Radeon HD 8490 / R5 235X OEM]
>   04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
>   05:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 720] (rev a1)
> 
> $ sudo dmesg | grep vgaarb
> 
>   pci 0000:00:02.0: vgaarb: setting as boot VGA device
>   pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
>   pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>   pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>   pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>   vgaarb: loaded
>   ast 0000:04:00.0: vgaarb: Override as primary by driver
>   i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
>   radeon 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
>   ast 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
> 
> v2:
> 	* Add a simple implemment for drm/i915 and drm/ast
> 	* Pick up all tags (Mario)
> v3:
> 	* Fix a mistake for drm/i915 implement
> 	* Fix patch can not be applied problem because of merge conflect.
> v4:
> 	* Focus on solve the real problem.
> 
> v1,v2 at https://patchwork.freedesktop.org/series/120059/
>     v3 at https://patchwork.freedesktop.org/series/120562/
> 
> [1] https://patchwork.freedesktop.org/series/122845/
> 
> Sui Jingfeng (9):
>    PCI/VGA: Allowing the user to select the primary video adapter at boot
>      time
>    drm/nouveau: Implement .be_primary() callback
>    drm/radeon: Implement .be_primary() callback
>    drm/amdgpu: Implement .be_primary() callback
>    drm/i915: Implement .be_primary() callback
>    drm/loongson: Implement .be_primary() callback
>    drm/ast: Register as a VGA client by calling vga_client_register()
>    drm/hibmc: Register as a VGA client by calling vga_client_register()
>    drm/gma500: Register as a VGA client by calling vga_client_register()
> 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 11 +++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 13 ++++-
>   drivers/gpu/drm/ast/ast_drv.c                 | 31 ++++++++++
>   drivers/gpu/drm/gma500/psb_drv.c              | 57 ++++++++++++++++++-
>   .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c   | 15 +++++
>   drivers/gpu/drm/i915/display/intel_vga.c      | 15 ++++-
>   drivers/gpu/drm/loongson/loongson_module.c    |  2 +-
>   drivers/gpu/drm/loongson/loongson_module.h    |  1 +
>   drivers/gpu/drm/loongson/lsdc_drv.c           | 10 +++-
>   drivers/gpu/drm/nouveau/nouveau_vga.c         | 11 +++-
>   drivers/gpu/drm/radeon/radeon_device.c        | 10 +++-
>   drivers/pci/vgaarb.c                          | 43 ++++++++++++--
>   drivers/vfio/pci/vfio_pci_core.c              |  2 +-
>   include/linux/vgaarb.h                        |  8 ++-
>   14 files changed, 210 insertions(+), 19 deletions(-)
>
Christian König Sept. 5, 2023, 1:28 p.m. UTC | #4
Am 05.09.23 um 12:38 schrieb Jani Nikula:
> On Tue, 05 Sep 2023, Sui Jingfeng <sui.jingfeng@linux.dev> wrote:
>> From: Sui Jingfeng <suijingfeng@loongson.cn>
>>
>> On a machine with multiple GPUs, a Linux user has no control over which
>> one is primary at boot time. This series tries to solve above mentioned
>> problem by introduced the ->be_primary() function stub. The specific
>> device drivers can provide an implementation to hook up with this stub by
>> calling the vga_client_register() function.
>>
>> Once the driver bound the device successfully, VGAARB will call back to
>> the device driver. To query if the device drivers want to be primary or
>> not. Device drivers can just pass NULL if have no such needs.
>>
>> Please note that:
>>
>> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would
>>     like to mount at least three video cards.

Well, you rarely find a board which can actually handle a single one :)

>>
>> 2) Typically, those non-86 machines don't have a good UEFI firmware
>>     support, which doesn't support select primary GPU as firmware stage.
>>     Even on x86, there are old UEFI firmwares which already made undesired
>>     decision for you.
>>
>> 3) This series is attempt to solve the remain problems at the driver level,
>>     while another series[1] of me is target to solve the majority of the
>>     problems at device level.
>>
>> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics
>> 630 is the default boot VGA, successfully override by ast2400 with
>> ast.modeset=10 append at the kernel cmd line.
> The value 10 is incredibly arbitrary, and multiplied as a magic number
> all over the place.

+1

>
>> $ lspci | grep VGA
>>
>>   00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
>>   01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XTX [Radeon HD 8490 / R5 235X OEM]
>>   04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
>>   05:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 720] (rev a1)
> In this example, all of the GPUs are driven by different drivers. What
> good does a module parameter do if you have multiple GPUs of the same
> model, all driven by the same driver module?

Completely agree. Question is what is the benefit for the end user to 
actually specify this?

If you want the initial console on a different device than implement a 
kernel options for vgaarb and *not* the drivers.

Regards,
Christian.

>
> BR,
> Jani.
>
>> $ sudo dmesg | grep vgaarb
>>
>>   pci 0000:00:02.0: vgaarb: setting as boot VGA device
>>   pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
>>   pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>>   pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>>   pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>>   vgaarb: loaded
>>   ast 0000:04:00.0: vgaarb: Override as primary by driver
>>   i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
>>   radeon 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
>>   ast 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
>>
>> v2:
>> 	* Add a simple implemment for drm/i915 and drm/ast
>> 	* Pick up all tags (Mario)
>> v3:
>> 	* Fix a mistake for drm/i915 implement
>> 	* Fix patch can not be applied problem because of merge conflect.
>> v4:
>> 	* Focus on solve the real problem.
>>
>> v1,v2 at https://patchwork.freedesktop.org/series/120059/
>>     v3 at https://patchwork.freedesktop.org/series/120562/
>>
>> [1] https://patchwork.freedesktop.org/series/122845/
>>
>> Sui Jingfeng (9):
>>    PCI/VGA: Allowing the user to select the primary video adapter at boot
>>      time
>>    drm/nouveau: Implement .be_primary() callback
>>    drm/radeon: Implement .be_primary() callback
>>    drm/amdgpu: Implement .be_primary() callback
>>    drm/i915: Implement .be_primary() callback
>>    drm/loongson: Implement .be_primary() callback
>>    drm/ast: Register as a VGA client by calling vga_client_register()
>>    drm/hibmc: Register as a VGA client by calling vga_client_register()
>>    drm/gma500: Register as a VGA client by calling vga_client_register()
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 11 +++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 13 ++++-
>>   drivers/gpu/drm/ast/ast_drv.c                 | 31 ++++++++++
>>   drivers/gpu/drm/gma500/psb_drv.c              | 57 ++++++++++++++++++-
>>   .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c   | 15 +++++
>>   drivers/gpu/drm/i915/display/intel_vga.c      | 15 ++++-
>>   drivers/gpu/drm/loongson/loongson_module.c    |  2 +-
>>   drivers/gpu/drm/loongson/loongson_module.h    |  1 +
>>   drivers/gpu/drm/loongson/lsdc_drv.c           | 10 +++-
>>   drivers/gpu/drm/nouveau/nouveau_vga.c         | 11 +++-
>>   drivers/gpu/drm/radeon/radeon_device.c        | 10 +++-
>>   drivers/pci/vgaarb.c                          | 43 ++++++++++++--
>>   drivers/vfio/pci/vfio_pci_core.c              |  2 +-
>>   include/linux/vgaarb.h                        |  8 ++-
>>   14 files changed, 210 insertions(+), 19 deletions(-)
Sui Jingfeng Sept. 5, 2023, 1:30 p.m. UTC | #5
Hi,


On 2023/9/5 18:45, Thomas Zimmermann wrote:
> Hi
>
> Am 04.09.23 um 21:57 schrieb Sui Jingfeng:
>> From: Sui Jingfeng <suijingfeng@loongson.cn>
>>
>> On a machine with multiple GPUs, a Linux user has no control over which
>> one is primary at boot time. This series tries to solve above mentioned
>
> If anything, the primary graphics adapter is the one initialized by 
> the firmware. I think our boot-up graphics also make this assumption 
> implicitly.
>

Yes, but by the time of DRM drivers get loaded successfully,the boot-up graphics already finished.
Firmware framebuffer device already get killed by the drm_aperture_remove_conflicting_pci_framebuffers()
function (or its siblings). So, this series is definitely not to interact with the firmware framebuffer
(or more intelligent framebuffer drivers).  It is for user space program, such as X server and Wayland
compositor. Its for Linux user or drm drivers testers, which allow them to direct graphic display server
using right hardware of interested as primary video card.

Also, I believe that X server and Wayland compositor are the best test examples.
If a specific DRM driver can't work with X server as a primary,
then there probably have something wrong.


> But what's the use case for overriding this setting?
>

On a specific machine with multiple GPUs mounted,
only the primary graphics get POST-ed (initialized) by the firmware.
Therefore, the DRM drivers for the rest video cards, have to choose to
work without the prerequisite setups done by firmware, This is called as POST.

One of the use cases of this series is to test if a specific DRM driver could works properly,
even though there is no prerequisite works have been done by firmware at all.
And it seems that the results is not satisfying in all cases.

drm/ast is the first drm drivers which refused to work if not being POST-ed by the firmware.

Before apply this series, I was unable make drm/ast as the primary video card easily. On a
multiple video card configuration, the monitor connected with the AST2400 not light up.
While confusing, a naive programmer may suspect the PRIME is not working.

After applied this series and passing ast.modeset=10 on the kernel cmd line,
I found that the monitor connected with my ast2400 video card still black,
It doesn't display and doesn't show image to me.

While in the process of study drm/ast, I know that drm/ast driver has the POST code shipped.
See the ast_post_gpu() function, then, I was wondering why this function doesn't works.
After a short-time (hasty) debugging, I found that the the ast_post_gpu() function
didn't get run. Because it have something to do with the ast->config_mode.

Without thinking too much, I hardcoded the ast->config_mode as ast_use_p2a to
force the ast_post_gpu() function get run.

```

--- a/drivers/gpu/drm/ast/ast_main.c
+++ b/drivers/gpu/drm/ast/ast_main.c
@@ -132,6 +132,8 @@ static int ast_device_config_init(struct ast_device 
*ast)
                 }
         }

+       ast->config_mode = ast_use_p2a;
+
         switch (ast->config_mode) {
         case ast_use_defaults:
                 drm_info(dev, "Using default configuration\n");

```

Then, the monitor light up, it display the Ubuntu greeter to me.
Therefore, my patch is helpful, at lease for the Linux drm driver tester and developer.
It allow programmers to test the specific part of the specific drive
without changing a line of the source code and without the need of sudo authority.
It helps to improve efficiency of the testing and patch verification.

I know the PrimaryGPU option of Xorg conf, but this approach will remember the setup
have been made, you need modify it with root authority each time you want to switch
the primary. But on rapid developing and/or testing multiple video drivers, with
only one computer hardware resource available. What we really want probably is a
one-shoot command as this series provide.

So, this is the first use case. This probably also help to test full modeset,
PRIME and reverse PRIME on multiple video card machine.


> Best regards
> Thomas
>
Sui Jingfeng Sept. 5, 2023, 2:28 p.m. UTC | #6
Hi,

On 2023/9/5 21:28, Christian König wrote:
>>>
>>> 2) Typically, those non-86 machines don't have a good UEFI firmware
>>>     support, which doesn't support select primary GPU as firmware 
>>> stage.
>>>     Even on x86, there are old UEFI firmwares which already made 
>>> undesired
>>>     decision for you.
>>>
>>> 3) This series is attempt to solve the remain problems at the driver 
>>> level,
>>>     while another series[1] of me is target to solve the majority of 
>>> the
>>>     problems at device level.
>>>
>>> Tested (limited) on x86 with four video card mounted, Intel UHD 
>>> Graphics
>>> 630 is the default boot VGA, successfully override by ast2400 with
>>> ast.modeset=10 append at the kernel cmd line.
>> The value 10 is incredibly arbitrary, and multiplied as a magic number
>> all over the place.
>
> +1 


This is the exact reason why I made this series as RFC, because this is a open-ended problem.
The choices of 3,4,5,6,7,8 and 9 are as arbitrary as the number of '10'. '1' and '2' is
definitely not suitable, because the seat has already been taken.

Take the drm/nouveau as an example:


```

MODULE_PARM_DESC(modeset, "enable driver (default: auto, "
		          "0 = disabled, 1 = enabled, 2 = headless)");
int nouveau_modeset = -1;
module_param_named(modeset, nouveau_modeset, int, 0400);

```


'1' is for enable the drm driver, some driver even override the 'nomodeset' parameter.

'2' is not suitable, because nouveau use it as headless GPU (render-only or compute class GPU?)

'3' is also not likely the best, the concerns is that
what if a specific drm driver want to expand the usage in the future?


The reason I pick up the digit '10' is that


1) The modeset parameter is unlikely to get expanded up to 10 usages.

Other drm drivers only use the '-1', '0' and 1, choose '2' will conflict with drm/nouveau.
By pick the digit '10', it leave some space(room) to various device driver authors.
It also helps to keep the usage consistent across various drivers.


2) An int taken up 4 byte, I don't want to waste even a single byte,

While in the process of defencing my patch, I have to say
draft another kernel command line would cause the wasting of precious RAM storage.

An int can have 2^31 usage, why we can't improve the utilization rate?

3) Please consider the fact that the modeset is the most common and attractive parameter

No name is better than the 'modeset', as other name is not easy to remember.

Again, this is for Linux user, thus it is not arbitrary.
Despite simple and trivial, I think about it more than one week.
Alex Williamson Sept. 5, 2023, 2:52 p.m. UTC | #7
On Tue,  5 Sep 2023 03:57:15 +0800
Sui Jingfeng <sui.jingfeng@linux.dev> wrote:

> From: Sui Jingfeng <suijingfeng@loongson.cn>
> 
> On a machine with multiple GPUs, a Linux user has no control over which
> one is primary at boot time. This series tries to solve above mentioned
> problem by introduced the ->be_primary() function stub. The specific
> device drivers can provide an implementation to hook up with this stub by
> calling the vga_client_register() function.
> 
> Once the driver bound the device successfully, VGAARB will call back to
> the device driver. To query if the device drivers want to be primary or
> not. Device drivers can just pass NULL if have no such needs.
> 
> Please note that:
> 
> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would
>    like to mount at least three video cards.
> 
> 2) Typically, those non-86 machines don't have a good UEFI firmware
>    support, which doesn't support select primary GPU as firmware stage.
>    Even on x86, there are old UEFI firmwares which already made undesired
>    decision for you.
> 
> 3) This series is attempt to solve the remain problems at the driver level,
>    while another series[1] of me is target to solve the majority of the
>    problems at device level.
> 
> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics
> 630 is the default boot VGA, successfully override by ast2400 with
> ast.modeset=10 append at the kernel cmd line.
> 
> $ lspci | grep VGA
> 
>  00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]

In all my previous experiments with VGA routing and IGD I found that
IGD can't actually release VGA routing and Intel confirmed the hardware
doesn't have the ability to do so.  It will always be primary from a
VGA routing perspective.  Was this actually tested with non-UEFI?

I suspect it might only work in UEFI mode where we probably don't
actually have a dependency on VGA routing.  This is essentially why
vfio requires UEFI ROMs when assigning GPUs to VMs, VGA routing is too
broken to use on Intel systems with IGD.  Thanks,

Alex

>  01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XTX [Radeon HD 8490 / R5 235X OEM]
>  04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
>  05:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 720] (rev a1)
> 
> $ sudo dmesg | grep vgaarb
> 
>  pci 0000:00:02.0: vgaarb: setting as boot VGA device
>  pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
>  pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>  pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>  pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>  vgaarb: loaded
>  ast 0000:04:00.0: vgaarb: Override as primary by driver
>  i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
>  radeon 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
>  ast 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
> 
> v2:
> 	* Add a simple implemment for drm/i915 and drm/ast
> 	* Pick up all tags (Mario)
> v3:
> 	* Fix a mistake for drm/i915 implement
> 	* Fix patch can not be applied problem because of merge conflect.
> v4:
> 	* Focus on solve the real problem.
> 
> v1,v2 at https://patchwork.freedesktop.org/series/120059/
>    v3 at https://patchwork.freedesktop.org/series/120562/
> 
> [1] https://patchwork.freedesktop.org/series/122845/
> 
> Sui Jingfeng (9):
>   PCI/VGA: Allowing the user to select the primary video adapter at boot
>     time
>   drm/nouveau: Implement .be_primary() callback
>   drm/radeon: Implement .be_primary() callback
>   drm/amdgpu: Implement .be_primary() callback
>   drm/i915: Implement .be_primary() callback
>   drm/loongson: Implement .be_primary() callback
>   drm/ast: Register as a VGA client by calling vga_client_register()
>   drm/hibmc: Register as a VGA client by calling vga_client_register()
>   drm/gma500: Register as a VGA client by calling vga_client_register()
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 11 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 13 ++++-
>  drivers/gpu/drm/ast/ast_drv.c                 | 31 ++++++++++
>  drivers/gpu/drm/gma500/psb_drv.c              | 57 ++++++++++++++++++-
>  .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c   | 15 +++++
>  drivers/gpu/drm/i915/display/intel_vga.c      | 15 ++++-
>  drivers/gpu/drm/loongson/loongson_module.c    |  2 +-
>  drivers/gpu/drm/loongson/loongson_module.h    |  1 +
>  drivers/gpu/drm/loongson/lsdc_drv.c           | 10 +++-
>  drivers/gpu/drm/nouveau/nouveau_vga.c         | 11 +++-
>  drivers/gpu/drm/radeon/radeon_device.c        | 10 +++-
>  drivers/pci/vgaarb.c                          | 43 ++++++++++++--
>  drivers/vfio/pci/vfio_pci_core.c              |  2 +-
>  include/linux/vgaarb.h                        |  8 ++-
>  14 files changed, 210 insertions(+), 19 deletions(-)
>
Thomas Zimmermann Sept. 5, 2023, 3:05 p.m. UTC | #8
Hi

Am 05.09.23 um 15:30 schrieb suijingfeng:
> Hi,
> 
> 
> On 2023/9/5 18:45, Thomas Zimmermann wrote:
>> Hi
>>
>> Am 04.09.23 um 21:57 schrieb Sui Jingfeng:
>>> From: Sui Jingfeng <suijingfeng@loongson.cn>
>>>
>>> On a machine with multiple GPUs, a Linux user has no control over which
>>> one is primary at boot time. This series tries to solve above mentioned
>>
>> If anything, the primary graphics adapter is the one initialized by 
>> the firmware. I think our boot-up graphics also make this assumption 
>> implicitly.
>>
> 
> Yes, but by the time of DRM drivers get loaded successfully,the boot-up 
> graphics already finished.
> Firmware framebuffer device already get killed by the 
> drm_aperture_remove_conflicting_pci_framebuffers()
> function (or its siblings). So, this series is definitely not to 
> interact with the firmware framebuffer

Yes and no. The helpers you mention will attempt to remove the firmware 
framebuffer on the given PCI device. If you have multiple PCI devices, 
the other devices would not be affected.

This also means that probing a non-primary card will not affect the 
firmware framebuffer on the primary card. You can have all these drivers 
co-exist next to each other. If you link a full DRM driver into the 
kernel image, it might even be loaded before the firmware-framebuffer's 
driver.  We had some funny bugs from these interactions.


> (or more intelligent framebuffer drivers).  It is for user space 
> program, such as X server and Wayland
> compositor. Its for Linux user or drm drivers testers, which allow them 
> to direct graphic display server
> using right hardware of interested as primary video card.
> 
> Also, I believe that X server and Wayland compositor are the best test 
> examples.
> If a specific DRM driver can't work with X server as a primary,
> then there probably have something wrong.

If you want to run a userspace compositor or X11 on a certain device, 
you best configure this in the program's config files. But not on the 
kernel command line.

The whole concept of a 'primary' display is bogus IMHO. It only exists 
because old VGA and BIOS (and their equivalents on non-PC systems) were 
unable to use more than one graphics device. Hence, as you write below, 
only the first device got POSTed by the BIOS. If you had an additional 
card, the device driver needed to perform the POSTing.

However, on modern Linux systems the primary display does not really 
exist. 'Primary' is the device that is available via VGA, VESA or EFI. 
Our drivers don't use these interfaces, but the native registers. As you 
said yourself, these firmware devices (VGA, VESA, EFI) are removed ASAP 
by the native drivers.

> 
> 
>> But what's the use case for overriding this setting?
>>
> 
> On a specific machine with multiple GPUs mounted,
> only the primary graphics get POST-ed (initialized) by the firmware.
> Therefore, the DRM drivers for the rest video cards, have to choose to
> work without the prerequisite setups done by firmware, This is called as 
> POST.
> 
> One of the use cases of this series is to test if a specific DRM driver 
> could works properly,
> even though there is no prerequisite works have been done by firmware at 
> all.
> And it seems that the results is not satisfying in all cases.
> 
> drm/ast is the first drm drivers which refused to work if not being 
> POST-ed by the firmware.

You might have found a bug in the ast driver. Ast has means to detect if 
the device has been POSTed and maybe do that. If this doesn't work 
correctly, it needs a fix.

As Christian mentioned, if anything, you might add an option to specify 
the default card to vgaarb (e.g., as PCI slot). But userspace should 
avoid the idea of a primary card IMHO.

Best regards
Thomas

> 
> Before apply this series, I was unable make drm/ast as the primary video 
> card easily. On a
> multiple video card configuration, the monitor connected with the 
> AST2400 not light up.
> While confusing, a naive programmer may suspect the PRIME is not working.
> 
> After applied this series and passing ast.modeset=10 on the kernel cmd 
> line,
> I found that the monitor connected with my ast2400 video card still black,
> It doesn't display and doesn't show image to me.
> 
> While in the process of study drm/ast, I know that drm/ast driver has 
> the POST code shipped.
> See the ast_post_gpu() function, then, I was wondering why this function 
> doesn't works.
> After a short-time (hasty) debugging, I found that the the 
> ast_post_gpu() function
> didn't get run. Because it have something to do with the ast->config_mode.
> 
> Without thinking too much, I hardcoded the ast->config_mode as 
> ast_use_p2a to
> force the ast_post_gpu() function get run.
> 
> ```
> 
> --- a/drivers/gpu/drm/ast/ast_main.c
> +++ b/drivers/gpu/drm/ast/ast_main.c
> @@ -132,6 +132,8 @@ static int ast_device_config_init(struct ast_device 
> *ast)
>                  }
>          }
> 
> +       ast->config_mode = ast_use_p2a;
> +
>          switch (ast->config_mode) {
>          case ast_use_defaults:
>                  drm_info(dev, "Using default configuration\n");
> 
> ```
> 
> Then, the monitor light up, it display the Ubuntu greeter to me.
> Therefore, my patch is helpful, at lease for the Linux drm driver tester 
> and developer.
> It allow programmers to test the specific part of the specific drive
> without changing a line of the source code and without the need of sudo 
> authority.
> It helps to improve efficiency of the testing and patch verification.
> 
> I know the PrimaryGPU option of Xorg conf, but this approach will 
> remember the setup
> have been made, you need modify it with root authority each time you 
> want to switch
> the primary. But on rapid developing and/or testing multiple video 
> drivers, with
> only one computer hardware resource available. What we really want 
> probably is a
> one-shoot command as this series provide.
> 
> So, this is the first use case. This probably also help to test full 
> modeset,
> PRIME and reverse PRIME on multiple video card machine.
> 
> 
>> Best regards
>> Thomas
>>
>
Sui Jingfeng Sept. 5, 2023, 3:59 p.m. UTC | #9
On 2023/9/5 18:49, Thomas Zimmermann wrote:
> Hi
>
> Am 04.09.23 um 21:57 schrieb Sui Jingfeng:
>> From: Sui Jingfeng <suijingfeng@loongson.cn>
>>
>> On a machine with multiple GPUs, a Linux user has no control over which
>> one is primary at boot time. This series tries to solve above mentioned
>> problem by introduced the ->be_primary() function stub. The specific
>> device drivers can provide an implementation to hook up with this 
>> stub by
>> calling the vga_client_register() function.
>>
>> Once the driver bound the device successfully, VGAARB will call back to
>> the device driver. To query if the device drivers want to be primary or
>> not. Device drivers can just pass NULL if have no such needs.
>>
>> Please note that:
>>
>> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would
>>     like to mount at least three video cards.
>>
>> 2) Typically, those non-86 machines don't have a good UEFI firmware
>>     support, which doesn't support select primary GPU as firmware stage.
>>     Even on x86, there are old UEFI firmwares which already made 
>> undesired
>>     decision for you.
>>
>> 3) This series is attempt to solve the remain problems at the driver 
>> level,
>>     while another series[1] of me is target to solve the majority of the
>>     problems at device level.
>>
>> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics
>> 630 is the default boot VGA, successfully override by ast2400 with
>> ast.modeset=10 append at the kernel cmd line.
>
> FYI: per-driver modeset parameters are deprecated and not to be used. 
> Please don't promote them.


Well, please wait, I want to explain.



drm/nouveau already promote it a little bit.

Despite no code of conduct or specification guiding how the modules parameters should be.
Noticed that there already have a lot of DRM drivers support the modeset parameters,
for the modeset parameter, authors of various device driver try to make the usage not
conflict with others. I believe that this is good thing for Linux users.
It is probably the responsibility of the drm core maintainers to force various drm
drivers to reach a minimal consensus. Probably it pains to do so and doesn't pay off.
But reach a minimal consensus do benefit to Linux users.


> You can use modprobe.blacklist or initcall_blacklist on the kernel 
> command line.
>
There are some cases where the modprobe.blacklist doesn't works,
I have come cross several time during the past.
Because the device selected by the VGAARB is device-level thing,
it is not the driver's problem.

Sometimes when VGAARB has a bug, it will select a wrong device as primary.
And the X server will use this wrong device as primary and completely crash
there, due to lack a driver. Take my old S3 Graphics as an example:

$ lspci | grep VGA

  00:06.1 VGA compatible controller: Loongson Technology LLC DC (Display Controller) (rev 01)
  03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XT [Radeon HD 7470/8470 / R5 235/310 OEM]
  07:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01)
  08:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01)

Before apply this patch:

[    0.361748] pci 0000:00:06.1: vgaarb: setting as boot VGA device
[    0.361753] pci 0000:00:06.1: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.361765] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.361773] pci 0000:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.361779] pci 0000:08:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.361781] vgaarb: loaded
[    0.367838] pci 0000:00:06.1: Overriding boot device as 1002:6778
[    0.367841] pci 0000:00:06.1: Overriding boot device as 5333:9070
[    0.367843] pci 0000:00:06.1: Overriding boot device as 5333:9070


For known reason, one of my system select the S3 Graphics as primary GPU.
But this S3 Graphics not even have a decent drm upstream driver yet.
Under such a case, I begin to believe that only the device who has a
driver deserve the primary.

Under such a condition, I want to reboot and enter the graphic environment
with other working video cards. Either platform integrated and discrete GPU.
This don't means I should compromise by un-mount the S3 graphics card from
the motherboard, this also don't means that I should update my BIOS setting.
As sometimes, the BIOS is more worse.

With this series applied, all I need to do is to reboot the computer and
pass a command line. By force override another video card (who has a
decent driver support) as primary, I'm able to do the debugging under
graphic environment. I would like to examine what's wrong with the vgaarb
on a specific platform under X server graphic environment.

Probably try compile a driver for this card and see it works, simply reboot
without the need to change anything. It is so efficient. So this is probably
the second usage of my patch. It hand the right of control back to the
graphic developer.
Sui Jingfeng Sept. 5, 2023, 4:21 p.m. UTC | #10
Hi,

On 2023/9/5 22:52, Alex Williamson wrote:
> On Tue,  5 Sep 2023 03:57:15 +0800
> Sui Jingfeng <sui.jingfeng@linux.dev> wrote:
>
>> From: Sui Jingfeng <suijingfeng@loongson.cn>
>>
>> On a machine with multiple GPUs, a Linux user has no control over which
>> one is primary at boot time. This series tries to solve above mentioned
>> problem by introduced the ->be_primary() function stub. The specific
>> device drivers can provide an implementation to hook up with this stub by
>> calling the vga_client_register() function.
>>
>> Once the driver bound the device successfully, VGAARB will call back to
>> the device driver. To query if the device drivers want to be primary or
>> not. Device drivers can just pass NULL if have no such needs.
>>
>> Please note that:
>>
>> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would
>>     like to mount at least three video cards.
>>
>> 2) Typically, those non-86 machines don't have a good UEFI firmware
>>     support, which doesn't support select primary GPU as firmware stage.
>>     Even on x86, there are old UEFI firmwares which already made undesired
>>     decision for you.
>>
>> 3) This series is attempt to solve the remain problems at the driver level,
>>     while another series[1] of me is target to solve the majority of the
>>     problems at device level.
>>
>> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics
>> 630 is the default boot VGA, successfully override by ast2400 with
>> ast.modeset=10 append at the kernel cmd line.
>>
>> $ lspci | grep VGA
>>
>>   00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
> In all my previous experiments with VGA routing and IGD I found that
> IGD can't actually release VGA routing and Intel confirmed the hardware
> doesn't have the ability to do so.  It will always be primary from a
> VGA routing perspective.  Was this actually tested with non-UEFI?

Yes, I have tested on my aspire e471 notebook (i5 5200U),
because that notebook using legacy firmware (also have UEFI, double firmware).
But this machine have difficult in install ubuntu under UEFI firmware in the past.
So I keep it using the legacy firmware.

It have two video card, IGD and nvidia video card(GFORCE 840M).
nvidia call its video card as 3D controller (pci->class = 0x030200)

I have tested this patch and another patch mention at [1] together.
I can tell you that the firmware framebuffer of this notebook using vesafb, not efifb.
And the framebuffer size (lfb.size) is very small. This is very strange,
but I don't have enough time to look in details. But still works.

I'm using and tesing my patch whenever and wherever possible.

> I suspect it might only work in UEFI mode where we probably don't
> actually have a dependency on VGA routing.  This is essentially why
> vfio requires UEFI ROMs when assigning GPUs to VMs, VGA routing is too
> broken to use on Intel systems with IGD.  Thanks,


What you tell me here is the side effect come with the VGA-compatible,
but I'm focus on the arbitration itself. I think there no need to keep
the VGA routing hardware features nowadays except that hardware vendor
want keep the backward compatibility and/or comply the PCI VGA compatible spec.


> Alex
>
Alex Williamson Sept. 5, 2023, 4:39 p.m. UTC | #11
On Wed, 6 Sep 2023 00:21:09 +0800
suijingfeng <suijingfeng@loongson.cn> wrote:

> Hi,
> 
> On 2023/9/5 22:52, Alex Williamson wrote:
> > On Tue,  5 Sep 2023 03:57:15 +0800
> > Sui Jingfeng <sui.jingfeng@linux.dev> wrote:
> >  
> >> From: Sui Jingfeng <suijingfeng@loongson.cn>
> >>
> >> On a machine with multiple GPUs, a Linux user has no control over which
> >> one is primary at boot time. This series tries to solve above mentioned
> >> problem by introduced the ->be_primary() function stub. The specific
> >> device drivers can provide an implementation to hook up with this stub by
> >> calling the vga_client_register() function.
> >>
> >> Once the driver bound the device successfully, VGAARB will call back to
> >> the device driver. To query if the device drivers want to be primary or
> >> not. Device drivers can just pass NULL if have no such needs.
> >>
> >> Please note that:
> >>
> >> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would
> >>     like to mount at least three video cards.
> >>
> >> 2) Typically, those non-86 machines don't have a good UEFI firmware
> >>     support, which doesn't support select primary GPU as firmware stage.
> >>     Even on x86, there are old UEFI firmwares which already made undesired
> >>     decision for you.
> >>
> >> 3) This series is attempt to solve the remain problems at the driver level,
> >>     while another series[1] of me is target to solve the majority of the
> >>     problems at device level.
> >>
> >> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics
> >> 630 is the default boot VGA, successfully override by ast2400 with
> >> ast.modeset=10 append at the kernel cmd line.
> >>
> >> $ lspci | grep VGA
> >>
> >>   00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]  
> > In all my previous experiments with VGA routing and IGD I found that
> > IGD can't actually release VGA routing and Intel confirmed the hardware
> > doesn't have the ability to do so.  It will always be primary from a
> > VGA routing perspective.  Was this actually tested with non-UEFI?  
> 
> Yes, I have tested on my aspire e471 notebook (i5 5200U),
> because that notebook using legacy firmware (also have UEFI, double firmware).
> But this machine have difficult in install ubuntu under UEFI firmware in the past.
> So I keep it using the legacy firmware.
> 
> It have two video card, IGD and nvidia video card(GFORCE 840M).
> nvidia call its video card as 3D controller (pci->class = 0x030200)
> 
> I have tested this patch and another patch mention at [1] together.
> I can tell you that the firmware framebuffer of this notebook using vesafb, not efifb.
> And the framebuffer size (lfb.size) is very small. This is very strange,
> but I don't have enough time to look in details. But still works.
> 
> I'm using and tesing my patch whenever and wherever possible.

So you're testing VGA routing using a non-VGA 3D controller through the
VESA address space?  How does that test anything about VGA routing?

> > I suspect it might only work in UEFI mode where we probably don't
> > actually have a dependency on VGA routing.  This is essentially why
> > vfio requires UEFI ROMs when assigning GPUs to VMs, VGA routing is too
> > broken to use on Intel systems with IGD.  Thanks,  
> 
> 
> What you tell me here is the side effect come with the VGA-compatible,
> but I'm focus on the arbitration itself. I think there no need to keep
> the VGA routing hardware features nowadays except that hardware vendor
> want keep the backward compatibility and/or comply the PCI VGA compatible spec.

"VGA arbitration" is the mediation of VGA routing between devices, so
I'm confused how you can be focused on the arbitration without the
routing itself.  Thanks,

Alex
Sui Jingfeng Sept. 6, 2023, 2:14 a.m. UTC | #12
Hi,


On 2023/9/5 23:05, Thomas Zimmermann wrote:
> However, on modern Linux systems the primary display does not really 
> exist.


No, it do exist.  X server need to know which one is the primary GPU.
The '*' character at the of (4@0:0:0) PCI device is the Primary.
The '*' denote primary, see the log below.

(II) xfree86: Adding drm device (/dev/dri/card2)
(II) xfree86: Adding drm device (/dev/dri/card0)
(II) Platform probe for 
/sys/devices/pci0000:00/0000:00:1c.5/0000:003:00.0/0000:04:00.0/drm/card0
(II) xfree86: Adding drm device (/dev/dri/card3)
(II) Platform probe for 
/sys/devices/pci0000:00/0000:00:1c.6/0000:005:00.0/drm/card3
(--) PCI: (0@0:2:0) 8086:3e91:8086:3e91 rev 0, Mem @ 
0xdb000000/167777216, 0xa0000000/536870912, I/O @ 0x0000f000/64, BIOS @ 
0x????????/131072
(--) PCI: (1@0:0:0) 1002:6771:1043:8636 rev 0, Mem @ 
0xc0000000/2688435456, 0xdf220000/131072, I/O @ 0x0000e000/256, BIOS @ 
0x????????/131072
(--) PCI:*(4@0:0:0) 1a03:2000:1a03:2000 rev 48, Mem @ 
0xde000000/166777216, 0xdf020000/131072, I/O @ 0x0000c000/128, BIOS @ 
0x????????/131072
(--) PCI: (5@0:0:0) 10de:1288:174b:b324 rev 161, Mem @ 
0xdc000000/116777216, 0xd0000000/134217728, 0xd8000000/33554432, I/O @ 
0x0000b000/128, BIOS @@0x????????/524288

The modesetting driver of X server will create framebuffer on the primary video adapter.
If a 2D video adapter (like the aspeed BMC) is not the primary, then it probably will not
be used. The only chance to be able to display something is to functional as a output slave.
But the output slave technology need the PRIME support for cross driver buffer sharing.

So, there do have some difference between the primary and non-primary video adapters.


> 'Primary' is the device that is available via VGA, VESA or EFI. Our 
> drivers don't use these interfaces, but the native registers. As you 
> said yourself, these firmware devices (VGA, VESA, EFI) are removed 
> ASAP by the native drivers.
Sui Jingfeng Sept. 6, 2023, 2:34 a.m. UTC | #13
On 2023/9/5 23:05, Thomas Zimmermann wrote:
> Hi
>
> Am 05.09.23 um 15:30 schrieb suijingfeng:
>> Hi,
>>
>>
>> On 2023/9/5 18:45, Thomas Zimmermann wrote:
>>> Hi
>>>
>>> Am 04.09.23 um 21:57 schrieb Sui Jingfeng:
>>>> From: Sui Jingfeng <suijingfeng@loongson.cn>
>>>>
>>>> On a machine with multiple GPUs, a Linux user has no control over 
>>>> which
>>>> one is primary at boot time. This series tries to solve above 
>>>> mentioned
>>>
>>> If anything, the primary graphics adapter is the one initialized by 
>>> the firmware. I think our boot-up graphics also make this assumption 
>>> implicitly.
>>>
>>
>> Yes, but by the time of DRM drivers get loaded successfully,the 
>> boot-up graphics already finished.
>> Firmware framebuffer device already get killed by the 
>> drm_aperture_remove_conflicting_pci_framebuffers()
>> function (or its siblings). So, this series is definitely not to 
>> interact with the firmware framebuffer
>
> Yes and no. The helpers you mention will attempt to remove the 
> firmware framebuffer on the given PCI device. If you have multiple PCI 
> devices, the other devices would not be affected.
>
Yes and no.


For the yes part: drm_aperture_remove_conflicting_pci_framebuffers() only kill the conflict one.
But for a specific machine with the modern UEFI firmware,
there should be only one firmware framebuffer driver.
That shoudd be the EFIFB(UEFI GOP). I do have multiple PCI devices,
but I don't understand when and why a system will have more than one firmware framebuffer.

Even for the machines with the legacy BIOS, the fixed VGA aperture address range
can only be owned by one firmware driver. It is just that we need to handle the
routing, the ->set_decode() callback of vga_client_register() is used to do such
work. Am I correct?
Sui Jingfeng Sept. 6, 2023, 3:08 a.m. UTC | #14
Hi,


On 2023/9/5 23:05, Thomas Zimmermann wrote:
> However, on modern Linux systems the primary display does not really 
> exist. 'Primary' is the device that is available via VGA, VESA or EFI. 

I may miss the point, what do you means by choose the word "modern"?
Are you trying to tell me that X server is too old and Wayland is the modern display server?


> Our drivers don't use these interfaces, but the native registers.


Yes and no?

Yes for the machine with the UEFI firmware,
but I not sure if this statement is true for the machine with the legacy firmware.

As the display controller in the ASpeed BMC is VGA compatible.
Therefore, in theory, it should works with the VGA console on the machine
with another VGA compatible video card. So the ast_vga_set_decode() function
provided in the 0007 patch probably useful on legacy firmware environment.

To be honest, I have tested this on various machine with UEFI firmware.
But I didn't realized that I should do the testing on legacy firmware environment
before sending this patch. It seems that the testing effort needed are quite
exhausting, since all my machines come with the UEFI firmware.

So is it OK to leave the legacy part to someone else who interested in it?
Probably Alex is more professional at legacy VGA routing stuff?
:-)
Sui Jingfeng Sept. 6, 2023, 3:51 a.m. UTC | #15
Hi,


On 2023/9/5 22:52, Alex Williamson wrote:
> On Tue,  5 Sep 2023 03:57:15 +0800
> Sui Jingfeng <sui.jingfeng@linux.dev> wrote:
>
>> From: Sui Jingfeng <suijingfeng@loongson.cn>
>>
>> On a machine with multiple GPUs, a Linux user has no control over which
>> one is primary at boot time. This series tries to solve above mentioned
>> problem by introduced the ->be_primary() function stub. The specific
>> device drivers can provide an implementation to hook up with this stub by
>> calling the vga_client_register() function.
>>
>> Once the driver bound the device successfully, VGAARB will call back to
>> the device driver. To query if the device drivers want to be primary or
>> not. Device drivers can just pass NULL if have no such needs.
>>
>> Please note that:
>>
>> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would
>>     like to mount at least three video cards.
>>
>> 2) Typically, those non-86 machines don't have a good UEFI firmware
>>     support, which doesn't support select primary GPU as firmware stage.
>>     Even on x86, there are old UEFI firmwares which already made undesired
>>     decision for you.
>>
>> 3) This series is attempt to solve the remain problems at the driver level,
>>     while another series[1] of me is target to solve the majority of the
>>     problems at device level.
>>
>> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics
>> 630 is the default boot VGA, successfully override by ast2400 with
>> ast.modeset=10 append at the kernel cmd line.
>>
>> $ lspci | grep VGA
>>
>>   00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
> In all my previous experiments with VGA routing and IGD I found that
> IGD can't actually release VGA routing and Intel confirmed the hardware
> doesn't have the ability to do so.

Which model of the IGD you are using? even for the IGD in Atom D2550,
the legacy 128KB VGA memory range can be tuned to be mapped to IGD
or to the DMI Interface. See the 1.7.3.2 section of the N2000 datasheet[1].

If a specific model of Intel has a bug in the VGA routing hardware logic unit,
I would like to ignore it. Or switch to the UEFI firmware on such hardware.

It is the hardware engineer's responsibility, I will not worry about it.
Thanks for you tell this.

[1] https://www.intel.com/content/dam/doc/datasheet/atom-d2000-n2000-vol-2-datasheet.pdf


>   It will always be primary from a
> VGA routing perspective.  Was this actually tested with non-UEFI?


As you already said, the generous Intel already have confirmed that the hardware defect.
So probably this is a good chance to switch to UEFI to solve the problem. Then, no
testing for legacy is needed.


> I suspect it might only work in UEFI mode where we probably don't
> actually have a dependency on VGA routing.  This is essentially why
> vfio requires UEFI ROMs when assigning GPUs to VMs, VGA routing is too
> broken to use on Intel systems with IGD.  Thanks,

Thanks for you tell me this.

To be honest, I have only tested my patch on machines with UEFI firmware.
Since UEFI because the main stream, but if this patch is really useful for
majority machine, I'm satisfied. The results is not too bad.

Thanks.

> Alex
>
Sui Jingfeng Sept. 6, 2023, 4:14 a.m. UTC | #16
Hi,

On 2023/9/5 23:05, Thomas Zimmermann wrote:
> You might have found a bug in the ast driver. Ast has means to detect 
> if the device has been POSTed and maybe do that. If this doesn't work 
> correctly, it needs a fix.
>
That sounds fine.

The bug is not a big deal, I'm just take it as an example and report it to you.
But a real fix can be complex, because there are quite a lot of servers
ship with ASpeed BMC hardware.

Honestly I don't have the time fix it on formal way.
I have already tons patches in pending and I will focus on solve VGAARB related problem.


Because I want to test your patch occasionally.
So this series is useful for myself at corner cases.
Christian König Sept. 6, 2023, 6:45 a.m. UTC | #17
Am 05.09.23 um 15:30 schrieb suijingfeng:
> Hi,
>
>
> On 2023/9/5 18:45, Thomas Zimmermann wrote:
>> Hi
>>
>> Am 04.09.23 um 21:57 schrieb Sui Jingfeng:
>>> From: Sui Jingfeng <suijingfeng@loongson.cn>
>>>
>>> On a machine with multiple GPUs, a Linux user has no control over which
>>> one is primary at boot time. This series tries to solve above mentioned
>>
>> If anything, the primary graphics adapter is the one initialized by 
>> the firmware. I think our boot-up graphics also make this assumption 
>> implicitly.
>>
>
> Yes, but by the time of DRM drivers get loaded successfully,the 
> boot-up graphics already finished.

This is an incorrect assumption.

drm_aperture_remove_conflicting_pci_framebuffers() and co don't kill the 
framebuffer, they just remove the current framebuffer driver to avoid 
further updates.

So what happens (at least for amdgpu) is that we take over the 
framebuffer, including both mode and it's contents, and provide a new 
framebuffer interface until DRM masters like X or Wayland take over.

> Firmware framebuffer device already get killed by the 
> drm_aperture_remove_conflicting_pci_framebuffers()
> function (or its siblings). So, this series is definitely not to 
> interact with the firmware framebuffer
> (or more intelligent framebuffer drivers).  It is for user space 
> program, such as X server and Wayland
> compositor. Its for Linux user or drm drivers testers, which allow 
> them to direct graphic display server
> using right hardware of interested as primary video card.
>
> Also, I believe that X server and Wayland compositor are the best test 
> examples.
> If a specific DRM driver can't work with X server as a primary,
> then there probably have something wrong.
>
>
>> But what's the use case for overriding this setting?
>>
>
> On a specific machine with multiple GPUs mounted,
> only the primary graphics get POST-ed (initialized) by the firmware.
> Therefore, the DRM drivers for the rest video cards, have to choose to
> work without the prerequisite setups done by firmware, This is called 
> as POST.

Well, you don't seem to understand the background here. This is 
perfectly normal behavior.

Secondary cards are posted after loading the appropriate DRM driver. At 
least for amdgpu this is done by calling the appropriate functions in 
the BIOS.

>
> One of the use cases of this series is to test if a specific DRM 
> driver could works properly,
> even though there is no prerequisite works have been done by firmware 
> at all.
> And it seems that the results is not satisfying in all cases.
>
> drm/ast is the first drm drivers which refused to work if not being 
> POST-ed by the firmware.

As far as I know this is expected as well. AST is a relatively simple 
driver and when it's not the primary one during boot the assumption is 
that it isn't used at all.

Regards,
Christian.

>
> Before apply this series, I was unable make drm/ast as the primary 
> video card easily. On a
> multiple video card configuration, the monitor connected with the 
> AST2400 not light up.
> While confusing, a naive programmer may suspect the PRIME is not working.
>
> After applied this series and passing ast.modeset=10 on the kernel cmd 
> line,
> I found that the monitor connected with my ast2400 video card still 
> black,
> It doesn't display and doesn't show image to me.
>
> While in the process of study drm/ast, I know that drm/ast driver has 
> the POST code shipped.
> See the ast_post_gpu() function, then, I was wondering why this 
> function doesn't works.
> After a short-time (hasty) debugging, I found that the the 
> ast_post_gpu() function
> didn't get run. Because it have something to do with the 
> ast->config_mode.
>
> Without thinking too much, I hardcoded the ast->config_mode as 
> ast_use_p2a to
> force the ast_post_gpu() function get run.
>
> ```
>
> --- a/drivers/gpu/drm/ast/ast_main.c
> +++ b/drivers/gpu/drm/ast/ast_main.c
> @@ -132,6 +132,8 @@ static int ast_device_config_init(struct 
> ast_device *ast)
>                 }
>         }
>
> +       ast->config_mode = ast_use_p2a;
> +
>         switch (ast->config_mode) {
>         case ast_use_defaults:
>                 drm_info(dev, "Using default configuration\n");
>
> ```
>
> Then, the monitor light up, it display the Ubuntu greeter to me.
> Therefore, my patch is helpful, at lease for the Linux drm driver 
> tester and developer.
> It allow programmers to test the specific part of the specific drive
> without changing a line of the source code and without the need of 
> sudo authority.
> It helps to improve efficiency of the testing and patch verification.
>
> I know the PrimaryGPU option of Xorg conf, but this approach will 
> remember the setup
> have been made, you need modify it with root authority each time you 
> want to switch
> the primary. But on rapid developing and/or testing multiple video 
> drivers, with
> only one computer hardware resource available. What we really want 
> probably is a
> one-shoot command as this series provide.
>
> So, this is the first use case. This probably also help to test full 
> modeset,
> PRIME and reverse PRIME on multiple video card machine.
>
>
>> Best regards
>> Thomas
>>
>
Christian König Sept. 6, 2023, 6:47 a.m. UTC | #18
Am 05.09.23 um 16:28 schrieb Sui Jingfeng:
> Hi,
>
> On 2023/9/5 21:28, Christian König wrote:
>>>>
>>>> 2) Typically, those non-86 machines don't have a good UEFI firmware
>>>>     support, which doesn't support select primary GPU as firmware 
>>>> stage.
>>>>     Even on x86, there are old UEFI firmwares which already made 
>>>> undesired
>>>>     decision for you.
>>>>
>>>> 3) This series is attempt to solve the remain problems at the 
>>>> driver level,
>>>>     while another series[1] of me is target to solve the majority 
>>>> of the
>>>>     problems at device level.
>>>>
>>>> Tested (limited) on x86 with four video card mounted, Intel UHD 
>>>> Graphics
>>>> 630 is the default boot VGA, successfully override by ast2400 with
>>>> ast.modeset=10 append at the kernel cmd line.
>>> The value 10 is incredibly arbitrary, and multiplied as a magic number
>>> all over the place.
>>
>> +1 
>
>
> This is the exact reason why I made this series as RFC, because this 
> is a open-ended problem.
> The choices of 3,4,5,6,7,8 and 9 are as arbitrary as the number of 
> '10'. '1' and '2' is
> definitely not suitable, because the seat has already been taken.

Well you are completely missing the point. *DON'T* abuse the modeset 
module parameters for this!

If you use 10 or any other value doesn't matter.

Regards,
Christian.

>
> Take the drm/nouveau as an example:
>
>
> ```
>
> MODULE_PARM_DESC(modeset, "enable driver (default: auto, "
>                   "0 = disabled, 1 = enabled, 2 = headless)");
> int nouveau_modeset = -1;
> module_param_named(modeset, nouveau_modeset, int, 0400);
>
> ```
>
>
> '1' is for enable the drm driver, some driver even override the 
> 'nomodeset' parameter.
>
> '2' is not suitable, because nouveau use it as headless GPU 
> (render-only or compute class GPU?)
>
> '3' is also not likely the best, the concerns is that
> what if a specific drm driver want to expand the usage in the future?
>
>
> The reason I pick up the digit '10' is that
>
>
> 1) The modeset parameter is unlikely to get expanded up to 10 usages.
>
> Other drm drivers only use the '-1', '0' and 1, choose '2' will 
> conflict with drm/nouveau.
> By pick the digit '10', it leave some space(room) to various device 
> driver authors.
> It also helps to keep the usage consistent across various drivers.
>
>
> 2) An int taken up 4 byte, I don't want to waste even a single byte,
>
> While in the process of defencing my patch, I have to say
> draft another kernel command line would cause the wasting of precious 
> RAM storage.
>
> An int can have 2^31 usage, why we can't improve the utilization rate?
>
> 3) Please consider the fact that the modeset is the most common and 
> attractive parameter
>
> No name is better than the 'modeset', as other name is not easy to 
> remember.
>
> Again, this is for Linux user, thus it is not arbitrary.
> Despite simple and trivial, I think about it more than one week.
>
Thomas Zimmermann Sept. 6, 2023, 7 a.m. UTC | #19
Hi

Am 06.09.23 um 04:14 schrieb suijingfeng:
> Hi,
> 
> 
> On 2023/9/5 23:05, Thomas Zimmermann wrote:
>> However, on modern Linux systems the primary display does not really 
>> exist.
> 
> 
> No, it do exist.  X server need to know which one is the primary GPU.
> The '*' character at the of (4@0:0:0) PCI device is the Primary.
> The '*' denote primary, see the log below.
> 
> (II) xfree86: Adding drm device (/dev/dri/card2)
> (II) xfree86: Adding drm device (/dev/dri/card0)
> (II) Platform probe for 
> /sys/devices/pci0000:00/0000:00:1c.5/0000:003:00.0/0000:04:00.0/drm/card0
> (II) xfree86: Adding drm device (/dev/dri/card3)
> (II) Platform probe for 
> /sys/devices/pci0000:00/0000:00:1c.6/0000:005:00.0/drm/card3
> (--) PCI: (0@0:2:0) 8086:3e91:8086:3e91 rev 0, Mem @ 
> 0xdb000000/167777216, 0xa0000000/536870912, I/O @ 0x0000f000/64, BIOS @ 
> 0x????????/131072
> (--) PCI: (1@0:0:0) 1002:6771:1043:8636 rev 0, Mem @ 
> 0xc0000000/2688435456, 0xdf220000/131072, I/O @ 0x0000e000/256, BIOS @ 
> 0x????????/131072
> (--) PCI:*(4@0:0:0) 1a03:2000:1a03:2000 rev 48, Mem @ 
> 0xde000000/166777216, 0xdf020000/131072, I/O @ 0x0000c000/128, BIOS @ 
> 0x????????/131072
> (--) PCI: (5@0:0:0) 10de:1288:174b:b324 rev 161, Mem @ 
> 0xdc000000/116777216, 0xd0000000/134217728, 0xd8000000/33554432, I/O @ 
> 0x0000b000/128, BIOS @@0x????????/524288
> 
> The modesetting driver of X server will create framebuffer on the 
> primary video adapter.
> If a 2D video adapter (like the aspeed BMC) is not the primary, then it 
> probably will not
> be used. The only chance to be able to display something is to 
> functional as a output slave.
> But the output slave technology need the PRIME support for cross driver 
> buffer sharing.
> 
> So, there do have some difference between the primary and non-primary 
> video adapters.

Xorg is a pretty bad example, because X parses the PCI bus and then 
tries to match devices to /dev/dri/ files. That's also not fixable in 
Xorg's current code base. Please don't promote Xorg's design. It dates 
back to the time when Xorg did the modesetting by itself.

Userspace should just open existing device files and start rendering. 
Maybe pick the previous settings and/or do some guess work about the 
arrangment of these devices. AFAIK that's what the modern compositors do.

Best regards
Thomas

> 
> 
>> 'Primary' is the device that is available via VGA, VESA or EFI. Our 
>> drivers don't use these interfaces, but the native registers. As you 
>> said yourself, these firmware devices (VGA, VESA, EFI) are removed 
>> ASAP by the native drivers. 
>
Thomas Zimmermann Sept. 6, 2023, 7:18 a.m. UTC | #20
Hi

Am 06.09.23 um 04:34 schrieb suijingfeng:
> 
> On 2023/9/5 23:05, Thomas Zimmermann wrote:
>> Hi
>>
>> Am 05.09.23 um 15:30 schrieb suijingfeng:
>>> Hi,
>>>
>>>
>>> On 2023/9/5 18:45, Thomas Zimmermann wrote:
>>>> Hi
>>>>
>>>> Am 04.09.23 um 21:57 schrieb Sui Jingfeng:
>>>>> From: Sui Jingfeng <suijingfeng@loongson.cn>
>>>>>
>>>>> On a machine with multiple GPUs, a Linux user has no control over 
>>>>> which
>>>>> one is primary at boot time. This series tries to solve above 
>>>>> mentioned
>>>>
>>>> If anything, the primary graphics adapter is the one initialized by 
>>>> the firmware. I think our boot-up graphics also make this assumption 
>>>> implicitly.
>>>>
>>>
>>> Yes, but by the time of DRM drivers get loaded successfully,the 
>>> boot-up graphics already finished.
>>> Firmware framebuffer device already get killed by the 
>>> drm_aperture_remove_conflicting_pci_framebuffers()
>>> function (or its siblings). So, this series is definitely not to 
>>> interact with the firmware framebuffer
>>
>> Yes and no. The helpers you mention will attempt to remove the 
>> firmware framebuffer on the given PCI device. If you have multiple PCI 
>> devices, the other devices would not be affected.
>>
> Yes and no.
> 
> 
> For the yes part: drm_aperture_remove_conflicting_pci_framebuffers() 
> only kill the conflict one.
> But for a specific machine with the modern UEFI firmware,
> there should be only one firmware framebuffer driver.
> That shoudd be the EFIFB(UEFI GOP). I do have multiple PCI devices,
> but I don't understand when and why a system will have more than one 
> firmware framebuffer.

Maybe somewhat unrelated to the actual discussion, but it's not as 
simple as you assume. Many non-X86 systems use DeviceTree. On Sparc 
IIRC, there's the case of having multiple firmware framebuffers listed 
in the DT. We create an device for each and attach a DRM firmware 
driver; ofdrm in this case. I haven't seen this in the wild, but 
non-Sparc systems could also behave like that.

And in addition to that, ARM-based systems often uses UEFI boot stub 
code that provides a simple UEFI environment to the kernel. For graphics 
we've had cases where we received the same firmware framebuffer from the 
DT and from the UEFI boot stub. We have to detect and handle such 
duplication in the kernel.

Best regards
Thomas

> 
> Even for the machines with the legacy BIOS, the fixed VGA aperture 
> address range
> can only be owned by one firmware driver. It is just that we need to 
> handle the
> routing, the ->set_decode() callback of vga_client_register() is used to 
> do such
> work. Am I correct?
> 
>
Thomas Zimmermann Sept. 6, 2023, 7:46 a.m. UTC | #21
Hi

Am 06.09.23 um 05:08 schrieb suijingfeng:
> Hi,
> 
> 
> On 2023/9/5 23:05, Thomas Zimmermann wrote:
>> However, on modern Linux systems the primary display does not really 
>> exist. 'Primary' is the device that is available via VGA, VESA or EFI. 
> 
> I may miss the point, what do you means by choose the word "modern"?
> Are you trying to tell me that X server is too old and Wayland is the 
> modern display server?

It comes down to that. Xorg's device handling is out of date. Fixing it 
would require a redesign of the whole program. A 'modern' compositor 
delegates device handling to the kernel. All it does is to open the 
device files and use the provided functionality. I've briefly mentioned 
this in the other email.

There's more to 'modern', such as 'uses Wayland for compositing', 'Mesa 
for direct rendering' or 'does atomic modesetting'. But that's all 
unrelated here.

> 
> 
>> Our drivers don't use these interfaces, but the native registers.
> 
> 
> Yes and no?
> 
> Yes for the machine with the UEFI firmware,
> but I not sure if this statement is true for the machine with the legacy 
> firmware.

What I mean is: the primary device is the one that owns the VGA/VESA/EFI 
I/O space. But DRM drivers don't program by VGA registers or VESA/EFI 
calls. They use the hardware's actual native registers in the each 
device's I/O space. So each device operates on it's own. They (usually) 
don't have to share/arbitrate access to the VGA registers.

Hence the idea of a primary device does not make sense here. It's useful 
to pick an initial default, but further display setup should rather be 
left to userspace.

> 
> As the display controller in the ASpeed BMC is VGA compatible.
> Therefore, in theory, it should works with the VGA console on the machine
> with another VGA compatible video card. So the ast_vga_set_decode() 
> function
> provided in the 0007 patch probably useful on legacy firmware environment.
> 
> To be honest, I have tested this on various machine with UEFI firmware.
> But I didn't realized that I should do the testing on legacy firmware 
> environment
> before sending this patch. It seems that the testing effort needed are 
> quite
> exhausting, since all my machines come with the UEFI firmware.
> 
> So is it OK to leave the legacy part to someone else who interested in it?
> Probably Alex is more professional at legacy VGA routing stuff?

Maybe you can describe the user's problem to us. TBH I still don't 
understand what you're trying to solve. If you what to set the console's 
initial output device, you can make a parameter in vgaarb. But I also 
don't really see a need for that either.

Best regards
Thomas

> :-)
> 
>
Thomas Zimmermann Sept. 6, 2023, 8:05 a.m. UTC | #22
Hi

Am 05.09.23 um 17:59 schrieb suijingfeng:
[...]
>> FYI: per-driver modeset parameters are deprecated and not to be used. 
>> Please don't promote them.
> 
> 
> Well, please wait, I want to explain.
> 
> 
> 
> drm/nouveau already promote it a little bit.
> 
> Despite no code of conduct or specification guiding how the modules 
> parameters should be.
> Noticed that there already have a lot of DRM drivers support the modeset 
> parameters,

Please look at the history and discussion around this parameter. To my 
knowledge, 'modeset' got introduced when modesetting with still done in 
userspace. It was an easy way of disabling the kernel driver if the 
system's Xorg did no yet support kernel mode setting.

Fast forward a few years and all Linux' use kernel modesetting, which 
make the modeset parameters obsolete. We discussed and decided to keep 
them in, because many articles and blog posts refer to them. We didn't 
want to invalidate them. BUT modeset is deprecated and not allowed in 
new code. If you look at existing modeset usage, you will eventually 
come across the comment at [1].

There's 'nomodeset', which disables all native drivers. It's useful for 
debugging or as a quick-fix if the graphics driver breaks. If you want 
to disable a specific driver, please use one of the options for 
blacklisting.

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v6.5/source/include/drm/drm_module.h#L83


> for the modeset parameter, authors of various device driver try to make 
> the usage not
> conflict with others. I believe that this is good thing for Linux users.
> It is probably the responsibility of the drm core maintainers to force 
> various drm
> drivers to reach a minimal consensus. Probably it pains to do so and 
> doesn't pay off.
> But reach a minimal consensus do benefit to Linux users.
> 
> 
>> You can use modprobe.blacklist or initcall_blacklist on the kernel 
>> command line.
>>
> There are some cases where the modprobe.blacklist doesn't works,
> I have come cross several time during the past.
> Because the device selected by the VGAARB is device-level thing,
> it is not the driver's problem.
> 
> Sometimes when VGAARB has a bug, it will select a wrong device as primary.
> And the X server will use this wrong device as primary and completely crash
> there, due to lack a driver. Take my old S3 Graphics as an example:
> 
> $ lspci | grep VGA
> 
>   00:06.1 VGA compatible controller: Loongson Technology LLC DC (Display 
> Controller) (rev 01)
>   03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
> [AMD/ATI] Caicos XT [Radeon HD 7470/8470 / R5 235/310 OEM]
>   07:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01)
>   08:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01)
> 
> Before apply this patch:
> 
> [    0.361748] pci 0000:00:06.1: vgaarb: setting as boot VGA device
> [    0.361753] pci 0000:00:06.1: vgaarb: VGA device added: 
> decodes=io+mem,owns=io+mem,locks=none
> [    0.361765] pci 0000:03:00.0: vgaarb: VGA device added: 
> decodes=io+mem,owns=none,locks=none
> [    0.361773] pci 0000:07:00.0: vgaarb: VGA device added: 
> decodes=io+mem,owns=none,locks=none
> [    0.361779] pci 0000:08:00.0: vgaarb: VGA device added: 
> decodes=io+mem,owns=none,locks=none
> [    0.361781] vgaarb: loaded
> [    0.367838] pci 0000:00:06.1: Overriding boot device as 1002:6778
> [    0.367841] pci 0000:00:06.1: Overriding boot device as 5333:9070
> [    0.367843] pci 0000:00:06.1: Overriding boot device as 5333:9070
> 
> 
> For known reason, one of my system select the S3 Graphics as primary GPU.
> But this S3 Graphics not even have a decent drm upstream driver yet.
> Under such a case, I begin to believe that only the device who has a
> driver deserve the primary.
> 
> Under such a condition, I want to reboot and enter the graphic environment
> with other working video cards. Either platform integrated and discrete 
> GPU.
> This don't means I should compromise by un-mount the S3 graphics card from
> the motherboard, this also don't means that I should update my BIOS 
> setting.
> As sometimes, the BIOS is more worse.
> 
> With this series applied, all I need to do is to reboot the computer and
> pass a command line. By force override another video card (who has a
> decent driver support) as primary, I'm able to do the debugging under
> graphic environment. I would like to examine what's wrong with the vgaarb
> on a specific platform under X server graphic environment.
> 
> Probably try compile a driver for this card and see it works, simply reboot
> without the need to change anything. It is so efficient. So this is 
> probably
> the second usage of my patch. It hand the right of control back to the
> graphic developer.
> 
>
Sui Jingfeng Sept. 6, 2023, 9:08 a.m. UTC | #23
Hi,


On 2023/9/6 14:45, Christian König wrote:
> Am 05.09.23 um 15:30 schrieb suijingfeng:
>> Hi,
>>
>>
>> On 2023/9/5 18:45, Thomas Zimmermann wrote:
>>> Hi
>>>
>>> Am 04.09.23 um 21:57 schrieb Sui Jingfeng:
>>>> From: Sui Jingfeng <suijingfeng@loongson.cn>
>>>>
>>>> On a machine with multiple GPUs, a Linux user has no control over 
>>>> which
>>>> one is primary at boot time. This series tries to solve above 
>>>> mentioned
>>>
>>> If anything, the primary graphics adapter is the one initialized by 
>>> the firmware. I think our boot-up graphics also make this assumption 
>>> implicitly.
>>>
>>
>> Yes, but by the time of DRM drivers get loaded successfully,the 
>> boot-up graphics already finished.
>
> This is an incorrect assumption.
>
> drm_aperture_remove_conflicting_pci_framebuffers() and co don't kill 
> the framebuffer, 

Well, my original description to this technique point is that

1) "Firmware framebuffer device already get killed by the drm_aperture_remove_conflicting_pci_framebuffers() function (or its siblings)"
2) "By the time of DRM drivers get loaded successfully, the boot-up graphics already finished."

The word "killed" here is rough and coarse description about
how does the drm device driver take over the firmware framebuffer.
Since there seems have something obscure our communication,
lets make the things clear. See below for more elaborate description.


> they just remove the current framebuffer driver to avoid further updates.
>
This statement doesn't sound right, for UEFI environment,
a correct description is that they remove the platform device, not the framebuffer driver.
For the machines with the UEFI firmware, framebuffer driver here definitely refer to the efifb.
The efifb still reside in the system(linux kernel).

Please see the aperture_detach_platform_device() function in video/aperture.c

> So what happens (at least for amdgpu) is that we take over the 
> framebuffer,

This statement here is also not an accurate description.

Strictly speaking, drm/amdgpu takes over the device (the VRAM hardware),
not the framebuffer.

The word "take over" here is also dubious, because drm/amdgpu takes over nothing.

 From the perspective of device-driver model, the GPU hardware *belongs* to the amdgpu drivers.
Why you need to take over a thing originally and belong to you?

If you could build the drm/amdgpu into the kernel and make it get loaded
before the efifb. Then, there no need to use the firmware framebuffer (
the talking is limited to the display boot graphics purpose here).
On such a case, the so-called "take over" will not happen.

The truth is that the efifb create a platform device, which *occupy*
part of the VRAM hardware resource. Thus, the efifb and the drm/amdgpu
form the conflict. There are conflict because they share the same
hardware resource. It is the hardware resources(address ranges) used
by two different driver are conflict. Not the efifb driver itself
conflict with drm/amdgpu driver.

Thus, drm_aperture_remove_conflicting_xxxxxx() function have to kill
one of the device are conflicting. Not to kill the driver. Therefore,
the correct word would be the "reclaim".
drm/amdgpu *reclaim* the hardware resource (vram address range) originally belong to you.

The modeset state (including the framebuffer content) still reside in the amdgpu device.
You just get the dirty framebuffer image in the framebuffer object.
But the framebuffer object already dirty since it in the UEFI firmware stage.

In conclusion, *reclaim* is more accurate than the "take over".
And as far as I'm understanding, the drm/amdgpu take over nothing, no gains.

Well, welcome to correct me if I'm wrong.
Christian König Sept. 6, 2023, 9:40 a.m. UTC | #24
Am 06.09.23 um 11:08 schrieb suijingfeng:
> Well, welcome to correct me if I'm wrong.

You seem to have some very basic misunderstandings here.

The term framebuffer describes some VRAM memory used for scanout.

This framebuffer is exposed to userspace through some framebuffer 
driver, on UEFI platforms that is usually efifb but can be quite a bunch 
of different drivers.

When the DRM drivers load they remove the previous drivers using 
drm_aperture_remove_conflicting_pci_framebuffers() (or similar 
function), but this does not mean that the framebuffer or scanout 
parameters are modified in any way. It just means that the framebuffer 
is just no longer exposed through this driver.

Take over is the perfectly right description here because that's exactly 
what's happening. The framebuffer configuration including the VRAM 
memory as well as the parameters for scanout are exposed by the newly 
loaded DRM driver.

In other words userspace can query through the DRM interfaces which 
monitors already driven by the hardware and so in your terminology 
figure out which is the primary one.

It's just that as Thomas explained as well that this completely 
irrelevant to any modern desktop. Both X and Wayland both iterate the 
available devices and start rendering to them which one was used during 
boot doesn't really matter to them.

Apart from that ranting like this and trying to explain stuff to people 
who obviously have much better background in the topic is not going to 
help your patches getting upstream.

Regards,
Christian.
Sui Jingfeng Sept. 6, 2023, 9:48 a.m. UTC | #25
Hi,


On 2023/9/6 16:05, Thomas Zimmermann wrote:
> Hi
>
> Am 05.09.23 um 17:59 schrieb suijingfeng:
> [...]
>>> FYI: per-driver modeset parameters are deprecated and not to be 
>>> used. Please don't promote them.
>>
>>
>> Well, please wait, I want to explain.
>>
>>
>>
>> drm/nouveau already promote it a little bit.
>>
>> Despite no code of conduct or specification guiding how the modules 
>> parameters should be.
>> Noticed that there already have a lot of DRM drivers support the 
>> modeset parameters,
>
> Please look at the history and discussion around this parameter. To my 
> knowledge, 'modeset' got introduced when modesetting with still done 
> in userspace. It was an easy way of disabling the kernel driver if the 
> system's Xorg did no yet support kernel mode setting.
>
> Fast forward a few years and all Linux' use kernel modesetting, which 
> make the modeset parameters obsolete. We discussed and decided to keep 
> them in, because many articles and blog posts refer to them. We didn't 
> want to invalidate them. BUT modeset is deprecated and not allowed in 
> new code. If you look at existing modeset usage, you will eventually 
> come across the comment at [1].
>

OK, no problem. I agree what you said.


> There's 'nomodeset', which disables all native drivers. It's useful 
> for debugging or as a quick-fix if the graphics driver breaks. If you 
> want to disable a specific driver, please use one of the options for 
> blacklisting.
>
Yeah, the 'nomodeset' disables all native drivers,
this is a good point of it, but this is also the weak point of it.

Sometimes, when you are developing a drm driver for a new device.
You will see the pain. Its too often a programmer's modification
make the entire Linux kernel hang there. The problematic drm
driver kernel module already in the initrd. Then, the real
need to disable the ill-functional drm driver kernel module
only. While what you recommend to disable them all. There
are subtle difference.

Another limitation of the 'nomodeset' parameter is that
it is only available on recent upstream kernel. Low version
downstream kernel don't has this parameter supported yet.
So this create inconstant developing experience. I believe that
there always some people need do back-port and upstream work
for various reasons.

While (kindly, no offensive) debating, since we have the modprobe.blacklist
why we still need the 'nomodeset' parameter ?
why not try modprobe.blacklist="amdgpu,radeon,i915,ast,nouveau,gma500_gfx, ..."

:-/


But OK in overall, I will listen to your advice.


> Best regards
> Thomas
>
> [1] 
> https://elixir.bootlin.com/linux/v6.5/source/include/drm/drm_module.h#L83
>
>
>> for the modeset parameter, authors of various device driver try to 
>> make the usage not
>> conflict with others. I believe that this is good thing for Linux users.
>> It is probably the responsibility of the drm core maintainers to 
>> force various drm
>> drivers to reach a minimal consensus. Probably it pains to do so and 
>> doesn't pay off.
>> But reach a minimal consensus do benefit to Linux users.
>>
>>
>>> You can use modprobe.blacklist or initcall_blacklist on the kernel 
>>> command line.
>>>
>> There are some cases where the modprobe.blacklist doesn't works,
>> I have come cross several time during the past.
>> Because the device selected by the VGAARB is device-level thing,
>> it is not the driver's problem.
>>
>> Sometimes when VGAARB has a bug, it will select a wrong device as 
>> primary.
>> And the X server will use this wrong device as primary and completely 
>> crash
>> there, due to lack a driver. Take my old S3 Graphics as an example:
>>
>> $ lspci | grep VGA
>>
>>   00:06.1 VGA compatible controller: Loongson Technology LLC DC 
>> (Display Controller) (rev 01)
>>   03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
>> [AMD/ATI] Caicos XT [Radeon HD 7470/8470 / R5 235/310 OEM]
>>   07:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 
>> (rev 01)
>>   08:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 
>> (rev 01)
>>
>> Before apply this patch:
>>
>> [    0.361748] pci 0000:00:06.1: vgaarb: setting as boot VGA device
>> [    0.361753] pci 0000:00:06.1: vgaarb: VGA device added: 
>> decodes=io+mem,owns=io+mem,locks=none
>> [    0.361765] pci 0000:03:00.0: vgaarb: VGA device added: 
>> decodes=io+mem,owns=none,locks=none
>> [    0.361773] pci 0000:07:00.0: vgaarb: VGA device added: 
>> decodes=io+mem,owns=none,locks=none
>> [    0.361779] pci 0000:08:00.0: vgaarb: VGA device added: 
>> decodes=io+mem,owns=none,locks=none
>> [    0.361781] vgaarb: loaded
>> [    0.367838] pci 0000:00:06.1: Overriding boot device as 1002:6778
>> [    0.367841] pci 0000:00:06.1: Overriding boot device as 5333:9070
>> [    0.367843] pci 0000:00:06.1: Overriding boot device as 5333:9070
>>
>>
>> For known reason, one of my system select the S3 Graphics as primary 
>> GPU.
>> But this S3 Graphics not even have a decent drm upstream driver yet.
>> Under such a case, I begin to believe that only the device who has a
>> driver deserve the primary.
>>
>> Under such a condition, I want to reboot and enter the graphic 
>> environment
>> with other working video cards. Either platform integrated and 
>> discrete GPU.
>> This don't means I should compromise by un-mount the S3 graphics card 
>> from
>> the motherboard, this also don't means that I should update my BIOS 
>> setting.
>> As sometimes, the BIOS is more worse.
>>
>> With this series applied, all I need to do is to reboot the computer and
>> pass a command line. By force override another video card (who has a
>> decent driver support) as primary, I'm able to do the debugging under
>> graphic environment. I would like to examine what's wrong with the 
>> vgaarb
>> on a specific platform under X server graphic environment.
>>
>> Probably try compile a driver for this card and see it works, simply 
>> reboot
>> without the need to change anything. It is so efficient. So this is 
>> probably
>> the second usage of my patch. It hand the right of control back to the
>> graphic developer.
>>
>>
>
Sui Jingfeng Sept. 6, 2023, 10:31 a.m. UTC | #26
Hi,

On 2023/9/6 14:45, Christian König wrote:
>> Firmware framebuffer device already get killed by the 
>> drm_aperture_remove_conflicting_pci_framebuffers()
>> function (or its siblings). So, this series is definitely not to 
>> interact with the firmware framebuffer
>> (or more intelligent framebuffer drivers).  It is for user space 
>> program, such as X server and Wayland
>> compositor. Its for Linux user or drm drivers testers, which allow 
>> them to direct graphic display server
>> using right hardware of interested as primary video card.
>>
>> Also, I believe that X server and Wayland compositor are the best 
>> test examples.
>> If a specific DRM driver can't work with X server as a primary,
>> then there probably have something wrong.
>>
>>
>>> But what's the use case for overriding this setting?
>>>
>>
>> On a specific machine with multiple GPUs mounted,
>> only the primary graphics get POST-ed (initialized) by the firmware.
>> Therefore, the DRM drivers for the rest video cards, have to choose to
>> work without the prerequisite setups done by firmware, This is called 
>> as POST.
>
> Well, you don't seem to understand the background here. This is 
> perfectly normal behavior.
>
> Secondary cards are posted after loading the appropriate DRM driver. 
> At least for amdgpu this is done by calling the appropriate functions 
> in the BIOS. 


Well, thanks for you tell me this. You know more than me and definitely have a better understanding.

Are you telling me that the POST function for AMDGPU reside in the BIOS?
The kernel call into the BIOS?
Does the BIOS here refer to the UEFI runtime or ATOM BIOS or something else?

But the POST function for the drm ast, reside in the kernel space (in other word, in ast.ko).
Is this statement correct?

I means that for ASpeed BMC chip, if the firmware not POST the display controller.
Then we have to POST it at the kernel space before doing various modeset option.
We can only POST this chip by directly operate the various registers.
Am I correct for the judgement about ast drm driver?

Thanks for your reviews.
Christian König Sept. 6, 2023, 10:50 a.m. UTC | #27
Am 06.09.23 um 12:31 schrieb Sui Jingfeng:
> Hi,
>
> On 2023/9/6 14:45, Christian König wrote:
>>> Firmware framebuffer device already get killed by the 
>>> drm_aperture_remove_conflicting_pci_framebuffers()
>>> function (or its siblings). So, this series is definitely not to 
>>> interact with the firmware framebuffer
>>> (or more intelligent framebuffer drivers).  It is for user space 
>>> program, such as X server and Wayland
>>> compositor. Its for Linux user or drm drivers testers, which allow 
>>> them to direct graphic display server
>>> using right hardware of interested as primary video card.
>>>
>>> Also, I believe that X server and Wayland compositor are the best 
>>> test examples.
>>> If a specific DRM driver can't work with X server as a primary,
>>> then there probably have something wrong.
>>>
>>>
>>>> But what's the use case for overriding this setting?
>>>>
>>>
>>> On a specific machine with multiple GPUs mounted,
>>> only the primary graphics get POST-ed (initialized) by the firmware.
>>> Therefore, the DRM drivers for the rest video cards, have to choose to
>>> work without the prerequisite setups done by firmware, This is 
>>> called as POST.
>>
>> Well, you don't seem to understand the background here. This is 
>> perfectly normal behavior.
>>
>> Secondary cards are posted after loading the appropriate DRM driver. 
>> At least for amdgpu this is done by calling the appropriate functions 
>> in the BIOS. 
>
>
> Well, thanks for you tell me this. You know more than me and 
> definitely have a better understanding.
>
> Are you telling me that the POST function for AMDGPU reside in the BIOS?
> The kernel call into the BIOS?

Yes, exactly that.

> Does the BIOS here refer to the UEFI runtime or ATOM BIOS or something 
> else?

On dGPUs it's the VBIOS on a flashrom on the board, for iGPUs (APUs as 
AMD calls them) it's part of the system BIOS.

UEFI is actually just a small subsystem in the system BIOS which 
replaced the old interface used between system BIOS, video BIOS and 
operating system.

>
> But the POST function for the drm ast, reside in the kernel space (in 
> other word, in ast.ko).
> Is this statement correct?

I don't know the ast driver well enough to answer that, but I assume 
they just read the BIOS and execute the appropriate functions.

>
> I means that for ASpeed BMC chip, if the firmware not POST the display 
> controller.
> Then we have to POST it at the kernel space before doing various 
> modeset option.
> We can only POST this chip by directly operate the various registers.
> Am I correct for the judgement about ast drm driver?

Well POST just means Power On Self Test, but what you mean is 
initializing the hardware.

Some drivers can of course initialize the hardware without the help of 
the BIOS, but I don't think AST can do that. As far as I know it's a 
relatively simple driver.

BTW firmware is not the same as the BIOS (which runs the POST), firmware 
usually refers to something run on microcontrollers inside the ASIC 
while the (system or video) BIOS runs on the host CPU.

Regards,
Christian.

>
> Thanks for your reviews.
>
Thomas Zimmermann Sept. 6, 2023, 11:06 a.m. UTC | #28
Hi

Am 06.09.23 um 11:48 schrieb suijingfeng:
[...]
> 
>> There's 'nomodeset', which disables all native drivers. It's useful 
>> for debugging or as a quick-fix if the graphics driver breaks. If you 
>> want to disable a specific driver, please use one of the options for 
>> blacklisting.
>>
> Yeah, the 'nomodeset' disables all native drivers,
> this is a good point of it, but this is also the weak point of it.

Well, that's by design. Graphics is at the core of the user experience. 
We often cannot _not_ provide it. And if it's broken, there needs to be 
a reliable fallback. There needs to be at least enough graphics support 
to run a terminal and repair the system. And it also needs to be simple 
enough for the average user. Falling back to serial terminals if often 
not an option.

At least here at SUSE, when users or customers report a broken graphics 
driver, we can tell them to start with 'nomodeset' and get at least the 
basic graphics. That's good enough for most productivity/office 
software. In the meantime, we investigate the problem.

There were concerns about the need of nomodeset, but I think it has 
proven to be useful in practice.

> Sometimes, when you are developing a drm driver for a new device.
> You will see the pain. Its too often a programmer's modification
> make the entire Linux kernel hang there. The problematic drm
> driver kernel module already in the initrd. Then, the real
> need to disable the ill-functional drm driver kernel module
> only. While what you recommend to disable them all. There
> are subtle difference.

I found that initcall_blacklist=<func name> works reliable for me.

> 
> Another limitation of the 'nomodeset' parameter is that
> it is only available on recent upstream kernel. Low version
> downstream kernel don't has this parameter supported yet.
> So this create inconstant developing experience. I believe that
> there always some people need do back-port and upstream work
> for various reasons.

Nomodeset used to be there, but in a different form. It forced VGA text 
mode IIRC. 'git grep' for vga_text_force() in an old kernel. We adopted 
the parameter for all of graphics, because it already did what we needed.

Best regards
Thomas

> 
> While (kindly, no offensive) debating, since we have the modprobe.blacklist
> why we still need the 'nomodeset' parameter ?
> why not try 
> modprobe.blacklist="amdgpu,radeon,i915,ast,nouveau,gma500_gfx, ..."
> 
> :-/
> 
> 
> But OK in overall, I will listen to your advice.
> 
> 
>> Best regards
>> Thomas
>>
>> [1] 
>> https://elixir.bootlin.com/linux/v6.5/source/include/drm/drm_module.h#L83
>>
>>
>>> for the modeset parameter, authors of various device driver try to 
>>> make the usage not
>>> conflict with others. I believe that this is good thing for Linux users.
>>> It is probably the responsibility of the drm core maintainers to 
>>> force various drm
>>> drivers to reach a minimal consensus. Probably it pains to do so and 
>>> doesn't pay off.
>>> But reach a minimal consensus do benefit to Linux users.
>>>
>>>
>>>> You can use modprobe.blacklist or initcall_blacklist on the kernel 
>>>> command line.
>>>>
>>> There are some cases where the modprobe.blacklist doesn't works,
>>> I have come cross several time during the past.
>>> Because the device selected by the VGAARB is device-level thing,
>>> it is not the driver's problem.
>>>
>>> Sometimes when VGAARB has a bug, it will select a wrong device as 
>>> primary.
>>> And the X server will use this wrong device as primary and completely 
>>> crash
>>> there, due to lack a driver. Take my old S3 Graphics as an example:
>>>
>>> $ lspci | grep VGA
>>>
>>>   00:06.1 VGA compatible controller: Loongson Technology LLC DC 
>>> (Display Controller) (rev 01)
>>>   03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
>>> [AMD/ATI] Caicos XT [Radeon HD 7470/8470 / R5 235/310 OEM]
>>>   07:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 
>>> (rev 01)
>>>   08:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 
>>> (rev 01)
>>>
>>> Before apply this patch:
>>>
>>> [    0.361748] pci 0000:00:06.1: vgaarb: setting as boot VGA device
>>> [    0.361753] pci 0000:00:06.1: vgaarb: VGA device added: 
>>> decodes=io+mem,owns=io+mem,locks=none
>>> [    0.361765] pci 0000:03:00.0: vgaarb: VGA device added: 
>>> decodes=io+mem,owns=none,locks=none
>>> [    0.361773] pci 0000:07:00.0: vgaarb: VGA device added: 
>>> decodes=io+mem,owns=none,locks=none
>>> [    0.361779] pci 0000:08:00.0: vgaarb: VGA device added: 
>>> decodes=io+mem,owns=none,locks=none
>>> [    0.361781] vgaarb: loaded
>>> [    0.367838] pci 0000:00:06.1: Overriding boot device as 1002:6778
>>> [    0.367841] pci 0000:00:06.1: Overriding boot device as 5333:9070
>>> [    0.367843] pci 0000:00:06.1: Overriding boot device as 5333:9070
>>>
>>>
>>> For known reason, one of my system select the S3 Graphics as primary 
>>> GPU.
>>> But this S3 Graphics not even have a decent drm upstream driver yet.
>>> Under such a case, I begin to believe that only the device who has a
>>> driver deserve the primary.
>>>
>>> Under such a condition, I want to reboot and enter the graphic 
>>> environment
>>> with other working video cards. Either platform integrated and 
>>> discrete GPU.
>>> This don't means I should compromise by un-mount the S3 graphics card 
>>> from
>>> the motherboard, this also don't means that I should update my BIOS 
>>> setting.
>>> As sometimes, the BIOS is more worse.
>>>
>>> With this series applied, all I need to do is to reboot the computer and
>>> pass a command line. By force override another video card (who has a
>>> decent driver support) as primary, I'm able to do the debugging under
>>> graphic environment. I would like to examine what's wrong with the 
>>> vgaarb
>>> on a specific platform under X server graphic environment.
>>>
>>> Probably try compile a driver for this card and see it works, simply 
>>> reboot
>>> without the need to change anything. It is so efficient. So this is 
>>> probably
>>> the second usage of my patch. It hand the right of control back to the
>>> graphic developer.
>>>
>>>
>>
>
Alex Williamson Sept. 6, 2023, 7:29 p.m. UTC | #29
On Wed, 6 Sep 2023 11:51:59 +0800
Sui Jingfeng <sui.jingfeng@linux.dev> wrote:

> Hi,
> 
> 
> On 2023/9/5 22:52, Alex Williamson wrote:
> > On Tue,  5 Sep 2023 03:57:15 +0800
> > Sui Jingfeng <sui.jingfeng@linux.dev> wrote:
> >  
> >> From: Sui Jingfeng <suijingfeng@loongson.cn>
> >>
> >> On a machine with multiple GPUs, a Linux user has no control over which
> >> one is primary at boot time. This series tries to solve above mentioned
> >> problem by introduced the ->be_primary() function stub. The specific
> >> device drivers can provide an implementation to hook up with this stub by
> >> calling the vga_client_register() function.
> >>
> >> Once the driver bound the device successfully, VGAARB will call back to
> >> the device driver. To query if the device drivers want to be primary or
> >> not. Device drivers can just pass NULL if have no such needs.
> >>
> >> Please note that:
> >>
> >> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would
> >>     like to mount at least three video cards.
> >>
> >> 2) Typically, those non-86 machines don't have a good UEFI firmware
> >>     support, which doesn't support select primary GPU as firmware stage.
> >>     Even on x86, there are old UEFI firmwares which already made undesired
> >>     decision for you.
> >>
> >> 3) This series is attempt to solve the remain problems at the driver level,
> >>     while another series[1] of me is target to solve the majority of the
> >>     problems at device level.
> >>
> >> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics
> >> 630 is the default boot VGA, successfully override by ast2400 with
> >> ast.modeset=10 append at the kernel cmd line.
> >>
> >> $ lspci | grep VGA
> >>
> >>   00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]  
> > In all my previous experiments with VGA routing and IGD I found that
> > IGD can't actually release VGA routing and Intel confirmed the hardware
> > doesn't have the ability to do so.  
> 
> Which model of the IGD you are using? even for the IGD in Atom D2550,
> the legacy 128KB VGA memory range can be tuned to be mapped to IGD
> or to the DMI Interface. See the 1.7.3.2 section of the N2000 datasheet[1].

I believe it's the VGA I/O that can't be disabled, there's no means to
do so other than the I/O enable bit in the command register and iirc
the driver depends on this for other features.  The history of this is
pretty old, but here are some links:

https://lore.kernel.org/all/1376486637.31494.19.camel@ul30vt.home/
https://bbs.archlinux.org/viewtopic.php?pid=1400212#p1400212
https://lore.kernel.org/all/20130815223917.27890.28003.stgit@bling.home/
https://lore.kernel.org/all/20130824144701.23370.42110.stgit@bling.home/
https://lore.kernel.org/all/20140509201655.2849.97478.stgit@bling.home/

I think the issue was that i915 doesn't claim to the VGA arbiter to be
controlling legacy VGA ranges, but in fact the hardware does claim
those ranges.  We can "fix" i915 to report that VGA MMIO space is
owned and can be controlled, but then Xorg likely sees multiple VGA
arbiter clients and disables DRI because it wants to mmap VGA MMIO
space.

Therefore unless something has changed in the past 10yrs, i915 owns but
does not advertise ownership of the VGA address spaces and therefore
the arbiter can't and doesn't know to change VGA routing to enable a
"be_primary" path to another device.
 
> If a specific model of Intel has a bug in the VGA routing hardware logic unit,
> I would like to ignore it. Or switch to the UEFI firmware on such hardware.

That's a convenient and impractical approach.  I expect all Intel HD
graphics has this issue.  Unknown for Xe.

> It is the hardware engineer's responsibility, I will not worry about it.

We often need to deal with broken hardware in the kernel.

> Thanks for you tell this.
> 
> [1] https://www.intel.com/content/dam/doc/datasheet/atom-d2000-n2000-vol-2-datasheet.pdf
> 
> 
> >   It will always be primary from a
> > VGA routing perspective.  Was this actually tested with non-UEFI?  
> 
> 
> As you already said, the generous Intel already have confirmed that the hardware defect.
> So probably this is a good chance to switch to UEFI to solve the problem. Then, no
> testing for legacy is needed.

Then why are we hacking on VGA arbitration in this series at all?

> > I suspect it might only work in UEFI mode where we probably don't
> > actually have a dependency on VGA routing.  This is essentially why
> > vfio requires UEFI ROMs when assigning GPUs to VMs, VGA routing is too
> > broken to use on Intel systems with IGD.  Thanks,  
> 
> Thanks for you tell me this.
> 
> To be honest, I have only tested my patch on machines with UEFI firmware.
> Since UEFI because the main stream, but if this patch is really useful for
> majority machine, I'm satisfied. The results is not too bad.

This looks like a pretty significant scoping issue if you're proposing
changes to the VGA arbiter which specifically handles the routing of
legacy VGA address spaces but are not willing to commit to testing
legacy configurations.  Thanks,

Alex
Sui Jingfeng Sept. 7, 2023, 2:30 a.m. UTC | #30
Hi,


On 2023/9/6 17:40, Christian König wrote:
> Am 06.09.23 um 11:08 schrieb suijingfeng:
>> Well, welcome to correct me if I'm wrong.
>
> You seem to have some very basic misunderstandings here.
>
> The term framebuffer describes some VRAM memory used for scanout.
>
> This framebuffer is exposed to userspace through some framebuffer 
> driver, on UEFI platforms that is usually efifb but can be quite a 
> bunch of different drivers.
>
> When the DRM drivers load they remove the previous drivers using 
> drm_aperture_remove_conflicting_pci_framebuffers() (or similar 
> function), but this does not mean that the framebuffer or scanout 
> parameters are modified in any way. It just means that the framebuffer 
> is just no longer exposed through this driver.
>
> Take over is the perfectly right description here because that's 
> exactly what's happening. The framebuffer configuration including the 
> VRAM memory as well as the parameters for scanout are exposed by the 
> newly loaded DRM driver.
>
> In other words userspace can query through the DRM interfaces which 
> monitors already driven by the hardware and so in your terminology 
> figure out which is the primary one.
>
I'm a little bit of not convinced about this idea, you might be correct.
But there cases where three are multiple monitors and each video card
connect one.

It also quite common that no monitors is connected, let the machine boot
first, then find a monitors to connect to a random display output. See
which will display. I don't expect the primary shake with.
The primary one have to be determined as early as possible, because of
the VGA console and the framebuffer console may directly output the primary.
Get the DDC and/or HPD involved may necessary complicated the problem.

There are ASpeed BMC who add a virtual connector in order to able display remotely.
There are also have commands to force a connector to be connected status.


> It's just that as Thomas explained as well that this completely 
> irrelevant to any modern desktop. Both X and Wayland both iterate the 
> available devices and start rendering to them which one was used 
> during boot doesn't really matter to them.
>
You may be correct, but I'm still not sure.
I probably need more times to investigate.
Me and my colleagues are mainly using X server,
the version varies from 1.20.4 and 1.21.1.4.
Even this is true, the problems still exist for non-modern desktops.

> Apart from that ranting like this and trying to explain stuff to 
> people who obviously have much better background in the topic is not 
> going to help your patches getting upstream.
>

Thanks for you tell me so much knowledge,
I'm realized where are the problems now.
I will try to resolve the concerns at the next version.


> Regards,
> Christian.
>
Christian König Sept. 7, 2023, 9:08 a.m. UTC | #31
Am 07.09.23 um 04:30 schrieb Sui Jingfeng:
> Hi,
>
>
> On 2023/9/6 17:40, Christian König wrote:
>> Am 06.09.23 um 11:08 schrieb suijingfeng:
>>> Well, welcome to correct me if I'm wrong.
>>
>> You seem to have some very basic misunderstandings here.
>>
>> The term framebuffer describes some VRAM memory used for scanout.
>>
>> This framebuffer is exposed to userspace through some framebuffer 
>> driver, on UEFI platforms that is usually efifb but can be quite a 
>> bunch of different drivers.
>>
>> When the DRM drivers load they remove the previous drivers using 
>> drm_aperture_remove_conflicting_pci_framebuffers() (or similar 
>> function), but this does not mean that the framebuffer or scanout 
>> parameters are modified in any way. It just means that the 
>> framebuffer is just no longer exposed through this driver.
>>
>> Take over is the perfectly right description here because that's 
>> exactly what's happening. The framebuffer configuration including the 
>> VRAM memory as well as the parameters for scanout are exposed by the 
>> newly loaded DRM driver.
>>
>> In other words userspace can query through the DRM interfaces which 
>> monitors already driven by the hardware and so in your terminology 
>> figure out which is the primary one.
>>
> I'm a little bit of not convinced about this idea, you might be correct.

Well I can point you to the code if you don't believe me.

> But there cases where three are multiple monitors and each video card
> connect one.

Yeah, but this is irrelevant. The key point is the configuration is 
taken over when the driver loads.

So whatever is there before as setup (one monitor showing console, three 
monitors mirrored, whatever) should be there after loading the driver as 
well. This configuration is just immediately overwritten because nobody 
cares about it.

>
> It also quite common that no monitors is connected, let the machine boot
> first, then find a monitors to connect to a random display output. See
> which will display. I don't expect the primary shake with.
> The primary one have to be determined as early as possible, because of
> the VGA console and the framebuffer console may directly output the 
> primary.

Well that is simply not correct. There is not concept of "primary" 
display, it can just be that a monitor was brought up by the BIOS or 
bootloader and we take over this configuration.

> Get the DDC and/or HPD involved may necessary complicated the problem.
>
> There are ASpeed BMC who add a virtual connector in order to able 
> display remotely.
> There are also have commands to force a connector to be connected status.
>
>
>> It's just that as Thomas explained as well that this completely 
>> irrelevant to any modern desktop. Both X and Wayland both iterate the 
>> available devices and start rendering to them which one was used 
>> during boot doesn't really matter to them.
>>
> You may be correct, but I'm still not sure.
> I probably need more times to investigate.
> Me and my colleagues are mainly using X server,
> the version varies from 1.20.4 and 1.21.1.4.
> Even this is true, the problems still exist for non-modern desktops.

Well, I have over 25 years of experience with display hardware and what 
you describe here was never an issue.

What you have is simply a broken display driver which for some reason 
can't handle your use case.

I strongly suggest that you just completely drop this here and go into 
the AST driver and try to fix it.

Regards,
Christian.


>
>> Apart from that ranting like this and trying to explain stuff to 
>> people who obviously have much better background in the topic is not 
>> going to help your patches getting upstream.
>>
>
> Thanks for you tell me so much knowledge,
> I'm realized where are the problems now.
> I will try to resolve the concerns at the next version.
>
>
>> Regards,
>> Christian.
>>
Jani Nikula Sept. 7, 2023, 9:43 a.m. UTC | #32
On Wed, 06 Sep 2023, suijingfeng <suijingfeng@loongson.cn> wrote:
> Another limitation of the 'nomodeset' parameter is that
> it is only available on recent upstream kernel. Low version
> downstream kernel don't has this parameter supported yet.
> So this create inconstant developing experience. I believe that
> there always some people need do back-port and upstream work
> for various reasons.

While that may be true, it's not an argument in favour of adding new
module parameters or special values to existing module parameters. They
would have to be backported just as well.

BR,
Jani.
Sui Jingfeng Sept. 7, 2023, 12:32 p.m. UTC | #33
Hi,


On 2023/9/7 17:08, Christian König wrote:
> Well, I have over 25 years of experience with display hardware and 
> what you describe here was never an issue. 

I want to give you an example to let you know more.

I have a ASRock AD2550B-ITX board[1],
When another discrete video card is mounted into it mini PCIe slot or PCI slot,
The IGD cannot be the primary display adapter anymore. The display is totally black.
I have try to draft a few trivial patch to help fix this[2].

And I want to use the IGD as primary, does this count as an issue?

[1] https://www.asrock.com/mb/Intel/AD2550-ITX/
[2] https://patchwork.freedesktop.org/series/123073/
Christian König Sept. 7, 2023, 12:43 p.m. UTC | #34
Am 07.09.23 um 14:32 schrieb suijingfeng:
> Hi,
>
>
> On 2023/9/7 17:08, Christian König wrote:
>> Well, I have over 25 years of experience with display hardware and 
>> what you describe here was never an issue. 
>
> I want to give you an example to let you know more.
>
> I have a ASRock AD2550B-ITX board[1],
> When another discrete video card is mounted into it mini PCIe slot or 
> PCI slot,
> The IGD cannot be the primary display adapter anymore. The display is 
> totally black.
> I have try to draft a few trivial patch to help fix this[2].
>
> And I want to use the IGD as primary, does this count as an issue?

No, this is completely expected behavior and a limitation of the 
hardware design.

As far as I know both AMD and Intel GPUs work the same here.

Regards,
Christian.

>
> [1] https://www.asrock.com/mb/Intel/AD2550-ITX/
> [2] https://patchwork.freedesktop.org/series/123073/
>
Sui Jingfeng Sept. 7, 2023, 3:26 p.m. UTC | #35
Hi,


On 2023/9/7 20:43, Christian König wrote:
> Am 07.09.23 um 14:32 schrieb suijingfeng:
>> Hi,
>>
>>
>> On 2023/9/7 17:08, Christian König wrote:
>>> Well, I have over 25 years of experience with display hardware and 
>>> what you describe here was never an issue. 
>>
>> I want to give you an example to let you know more.
>>
>> I have a ASRock AD2550B-ITX board[1],
>> When another discrete video card is mounted into it mini PCIe slot or 
>> PCI slot,
>> The IGD cannot be the primary display adapter anymore. The display is 
>> totally black.
>> I have try to draft a few trivial patch to help fix this[2].
>>
>> And I want to use the IGD as primary, does this count as an issue?
>
> No, this is completely expected behavior and a limitation of the 
> hardware design.
>
> As far as I know both AMD and Intel GPUs work the same here.
>
> Regards,
> Christian.
>
>>
>> [1] https://www.asrock.com/mb/Intel/AD2550-ITX/
>> [2] https://patchwork.freedesktop.org/series/123073/
>>

Then, I'll give you another example, see below for elaborate description.
I have one AMD BC160 GPU, see[1] to get what it looks like.

The GPU don't has a display connector interface exported.
It actually can be seen as a render-only GPU or compute class GPU for bitcoin.
But the firmware of it still acclaim this GPU as VGA compatible.
When mount this GPU onto motherboard, the system always select this GPU as primary.
But this GPU can't be able to connect with a monitor.

Under such a situation, modprobe.blacklist=amdgpu don't works either,
because vgaarb always select this GPU as primary, this is a device-level decision.

$ dmesg | grep vgaarb:

[    3.541405] pci 0000:0c:00.0: vgaarb: BAR 0: [mem 0xa0000000-0xafffffff 64bit pref] contains firmware FB [0xa0000000-0xa02fffff]
[    3.901448] pci 0000:05:00.0: vgaarb: setting as boot VGA device
[    3.905375] pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    3.905382] pci 0000:0c:00.0: vgaarb: setting as boot VGA device (overriding previous)
[    3.909375] pci 0000:0c:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    3.913375] pci 0000:0d:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    3.913377] vgaarb: loaded
[   13.513760] amdgpu 0000:0c:00.0: vgaarb: deactivate vga console
[   19.020992] amdgpu 0000:0c:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem

I'm using ubuntu 22.04 system, with ast.modeset=10 passed on the cmd line,
I still be able to enter the graphics system. And views this GPU as a render-only GPU.
Probably continue to examine what's wrong, except this, drm/amdgpu report
" *ERROR* IB test failed on sdma0 (-110)" to me.

Does this count as problem?

Before I could find solution, I have keep this de-fact render only GPU mounted.
Because I need recompile kennel module, install the kernel module and testing.

All I need is a 2D video card to display something, ast drm is OK, despite simple.
It suit the need for my daily usage with VIM, that's enough for me.

Now, the real questions that I want ask is:

1)

Does the fact that when the kernel driver module got blocked (by modprobe.blacklist=amdgpu),
while the vgaarb still select it as primary which leave the X server crash there (because no kennel space driver loaded)
count as a problem?


2)

Does my approach that mounting another GPU as the primary display adapter,
while its real purpose is to solving bugs and development for another GPU,
count as a use case?


$ cat demsg.txt | grep drm

[   10.099888] ACPI: bus type drm_connector registered
[   11.083920] etnaviv 0000:0d:00.0: [drm] bind etnaviv-display, master 
name: 0000:0d:00.0
[   11.084106] [drm] Initialized etnaviv 1.3.0 20151214 for 0000:0d:00.0 
on minor 0
[   13.301702] [drm] amdgpu kernel modesetting enabled.
[   13.359820] [drm] initializing kernel modesetting (NAVI12 
0x1002:0x7360 0x1002:0x0A34 0xC7).
[   13.368246] [drm] register mmio base: 0xEB100000
[   13.372861] [drm] register mmio size: 524288
[   13.380788] [drm] add ip block number 0 <nv_common>
[   13.385661] [drm] add ip block number 1 <gmc_v10_0>
[   13.390531] [drm] add ip block number 2 <navi10_ih>
[   13.395405] [drm] add ip block number 3 <psp>
[   13.399760] [drm] add ip block number 4 <smu>
[   13.404111] [drm] add ip block number 5 <dm>
[   13.408378] [drm] add ip block number 6 <gfx_v10_0>
[   13.413249] [drm] add ip block number 7 <sdma_v5_0>
[   13.433546] [drm] add ip block number 8 <vcn_v2_0>
[   13.433547] [drm] add ip block number 9 <jpeg_v2_0>
[   13.497757] [drm] VCN decode is enabled in VM mode
[   13.502540] [drm] VCN encode is enabled in VM mode
[   13.508785] [drm] JPEG decode is enabled in VM mode
[   13.529596] [drm] vm size is 262144 GB, 4 levels, block size is 
9-bit, fragment size is 9-bit
[   13.564762] [drm] Detected VRAM RAM=8176M, BAR=256M
[   13.569628] [drm] RAM width 2048bits HBM
[   13.574167] [drm] amdgpu: 8176M of VRAM memory ready
[   13.579125] [drm] amdgpu: 15998M of GTT memory ready.
[   13.584184] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   13.590505] [drm] PCIE GART of 512M enabled (table at 
0x0000008000300000).
[   13.598749] [drm] Found VCN firmware Version ENC: 1.16 DEC: 5 VEP: 0 
Revision: 4
[   13.671786] [drm] reserve 0xe00000 from 0x81fd000000 for PSP TMR
[   13.801235] [drm] Display Core v3.2.247 initialized on DCN 2.0
[   13.807061] [drm] DP-HDMI FRL PCON supported
[   13.832382] [drm] kiq ring mec 2 pipe 1 q 0
[   13.838131] [drm] VCN decode and encode initialized 
successfully(under DPG Mode).
[   13.845877] [drm] JPEG decode initialized successfully.
[   14.072508] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:0c:00.0 
on minor 1
[   14.080976] amdgpu 0000:0c:00.0: [drm] Cannot find any crtc or sizes
[   14.087341] [drm] DSC precompute is not needed.
[   16.487330] systemd[1]: Starting Load Kernel Module drm...
[  619.901873] [drm] PCIE GART of 512M enabled (table at 
0x0000008000300000).
[  619.901898] [drm] PSP is resuming...
[  619.925307] [drm] reserve 0xe00000 from 0x81fd000000 for PSP TMR
[  619.991034] [drm] psp gfx command AUTOLOAD_RLC(0x21) failed and 
response status is (0xFFFF000D)
[  620.294366] [drm] kiq ring mec 2 pipe 1 q 0
[  620.298953] [drm] VCN decode and encode initialized 
successfully(under DPG Mode).
[  620.299103] [drm] JPEG decode initialized successfully.
[  621.309543] [drm:sdma_v5_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB 
test timed out
[  621.317577] amdgpu 0000:0c:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] 
*ERROR* IB test failed on sdma0 (-110).
[  622.333548] [drm:sdma_v5_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB 
test timed out
[  622.341587] amdgpu 0000:0c:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] 
*ERROR* IB test failed on sdma1 (-110).
[  622.354071] [drm:amdgpu_device_delayed_init_work_handler [amdgpu]] 
*ERROR* ib ring test failed (-110).
[  622.363721] amdgpu 0000:0c:00.0: [drm] Cannot find any crtc or sizes

[1] https://www.techpowerup.com/gpu-specs/xfx-bc-160.b9346
Christian König Sept. 7, 2023, 3:32 p.m. UTC | #36
Am 07.09.23 um 17:26 schrieb suijingfeng:
> [SNIP]

> Then, I'll give you another example, see below for elaborate description.
> I have one AMD BC160 GPU, see[1] to get what it looks like.
>
> The GPU don't has a display connector interface exported.
> It actually can be seen as a render-only GPU or compute class GPU for 
> bitcoin.
> But the firmware of it still acclaim this GPU as VGA compatible.
> When mount this GPU onto motherboard, the system always select this 
> GPU as primary.
> But this GPU can't be able to connect with a monitor.
>
> Under such a situation, modprobe.blacklist=amdgpu don't works either,
> because vgaarb always select this GPU as primary, this is a 
> device-level decision.

It's not VGAARB which makes this selection, it's the BIOS. VGAARB just 
detects what the BIOS has decided.

>
> $ dmesg | grep vgaarb:
>
> [    3.541405] pci 0000:0c:00.0: vgaarb: BAR 0: [mem 
> 0xa0000000-0xafffffff 64bit pref] contains firmware FB 
> [0xa0000000-0xa02fffff]
> [    3.901448] pci 0000:05:00.0: vgaarb: setting as boot VGA device
> [    3.905375] pci 0000:05:00.0: vgaarb: VGA device added: 
> decodes=io+mem,owns=none,locks=none
> [    3.905382] pci 0000:0c:00.0: vgaarb: setting as boot VGA device 
> (overriding previous)
> [    3.909375] pci 0000:0c:00.0: vgaarb: VGA device added: 
> decodes=io+mem,owns=io+mem,locks=none
> [    3.913375] pci 0000:0d:00.0: vgaarb: VGA device added: 
> decodes=io+mem,owns=none,locks=none
> [    3.913377] vgaarb: loaded
> [   13.513760] amdgpu 0000:0c:00.0: vgaarb: deactivate vga console
> [   19.020992] amdgpu 0000:0c:00.0: vgaarb: changed VGA decodes: 
> olddecodes=io+mem,decodes=none:owns=io+mem
>
> I'm using ubuntu 22.04 system, with ast.modeset=10 passed on the cmd 
> line,
> I still be able to enter the graphics system. And views this GPU as a 
> render-only GPU.
> Probably continue to examine what's wrong, except this, drm/amdgpu report
> " *ERROR* IB test failed on sdma0 (-110)" to me.
>
> Does this count as problem?

No, again that is perfectly expected behavior.

Some BIOSes (or maybe most by modern standard) allows to override this, 
but if you later override this by the OS you run the hardware outside 
what's validated.

When you put a VGA device into a board with an integrated VGA device the 
integrated one gets disabled. This is even part of some PCIe 
specification IIRC.

So the problems you run into here are perfectly expected.

Regards,
Christian.

>
> Before I could find solution, I have keep this de-fact render only GPU 
> mounted.
> Because I need recompile kennel module, install the kernel module and 
> testing.
>
> All I need is a 2D video card to display something, ast drm is OK, 
> despite simple.
> It suit the need for my daily usage with VIM, that's enough for me.
>
> Now, the real questions that I want ask is:
>
> 1)
>
> Does the fact that when the kernel driver module got blocked (by 
> modprobe.blacklist=amdgpu),
> while the vgaarb still select it as primary which leave the X server 
> crash there (because no kennel space driver loaded)
> count as a problem?
>
>
> 2)
>
> Does my approach that mounting another GPU as the primary display 
> adapter,
> while its real purpose is to solving bugs and development for another 
> GPU,
> count as a use case?
>
>
> $ cat demsg.txt | grep drm
>
> [   10.099888] ACPI: bus type drm_connector registered
> [   11.083920] etnaviv 0000:0d:00.0: [drm] bind etnaviv-display, 
> master name: 0000:0d:00.0
> [   11.084106] [drm] Initialized etnaviv 1.3.0 20151214 for 
> 0000:0d:00.0 on minor 0
> [   13.301702] [drm] amdgpu kernel modesetting enabled.
> [   13.359820] [drm] initializing kernel modesetting (NAVI12 
> 0x1002:0x7360 0x1002:0x0A34 0xC7).
> [   13.368246] [drm] register mmio base: 0xEB100000
> [   13.372861] [drm] register mmio size: 524288
> [   13.380788] [drm] add ip block number 0 <nv_common>
> [   13.385661] [drm] add ip block number 1 <gmc_v10_0>
> [   13.390531] [drm] add ip block number 2 <navi10_ih>
> [   13.395405] [drm] add ip block number 3 <psp>
> [   13.399760] [drm] add ip block number 4 <smu>
> [   13.404111] [drm] add ip block number 5 <dm>
> [   13.408378] [drm] add ip block number 6 <gfx_v10_0>
> [   13.413249] [drm] add ip block number 7 <sdma_v5_0>
> [   13.433546] [drm] add ip block number 8 <vcn_v2_0>
> [   13.433547] [drm] add ip block number 9 <jpeg_v2_0>
> [   13.497757] [drm] VCN decode is enabled in VM mode
> [   13.502540] [drm] VCN encode is enabled in VM mode
> [   13.508785] [drm] JPEG decode is enabled in VM mode
> [   13.529596] [drm] vm size is 262144 GB, 4 levels, block size is 
> 9-bit, fragment size is 9-bit
> [   13.564762] [drm] Detected VRAM RAM=8176M, BAR=256M
> [   13.569628] [drm] RAM width 2048bits HBM
> [   13.574167] [drm] amdgpu: 8176M of VRAM memory ready
> [   13.579125] [drm] amdgpu: 15998M of GTT memory ready.
> [   13.584184] [drm] GART: num cpu pages 131072, num gpu pages 131072
> [   13.590505] [drm] PCIE GART of 512M enabled (table at 
> 0x0000008000300000).
> [   13.598749] [drm] Found VCN firmware Version ENC: 1.16 DEC: 5 VEP: 
> 0 Revision: 4
> [   13.671786] [drm] reserve 0xe00000 from 0x81fd000000 for PSP TMR
> [   13.801235] [drm] Display Core v3.2.247 initialized on DCN 2.0
> [   13.807061] [drm] DP-HDMI FRL PCON supported
> [   13.832382] [drm] kiq ring mec 2 pipe 1 q 0
> [   13.838131] [drm] VCN decode and encode initialized 
> successfully(under DPG Mode).
> [   13.845877] [drm] JPEG decode initialized successfully.
> [   14.072508] [drm] Initialized amdgpu 3.54.0 20150101 for 
> 0000:0c:00.0 on minor 1
> [   14.080976] amdgpu 0000:0c:00.0: [drm] Cannot find any crtc or sizes
> [   14.087341] [drm] DSC precompute is not needed.
> [   16.487330] systemd[1]: Starting Load Kernel Module drm...
> [  619.901873] [drm] PCIE GART of 512M enabled (table at 
> 0x0000008000300000).
> [  619.901898] [drm] PSP is resuming...
> [  619.925307] [drm] reserve 0xe00000 from 0x81fd000000 for PSP TMR
> [  619.991034] [drm] psp gfx command AUTOLOAD_RLC(0x21) failed and 
> response status is (0xFFFF000D)
> [  620.294366] [drm] kiq ring mec 2 pipe 1 q 0
> [  620.298953] [drm] VCN decode and encode initialized 
> successfully(under DPG Mode).
> [  620.299103] [drm] JPEG decode initialized successfully.
> [  621.309543] [drm:sdma_v5_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: 
> IB test timed out
> [  621.317577] amdgpu 0000:0c:00.0: [drm:amdgpu_ib_ring_tests 
> [amdgpu]] *ERROR* IB test failed on sdma0 (-110).
> [  622.333548] [drm:sdma_v5_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: 
> IB test timed out
> [  622.341587] amdgpu 0000:0c:00.0: [drm:amdgpu_ib_ring_tests 
> [amdgpu]] *ERROR* IB test failed on sdma1 (-110).
> [  622.354071] [drm:amdgpu_device_delayed_init_work_handler [amdgpu]] 
> *ERROR* ib ring test failed (-110).
> [  622.363721] amdgpu 0000:0c:00.0: [drm] Cannot find any crtc or sizes
>
> [1] https://www.techpowerup.com/gpu-specs/xfx-bc-160.b9346
>
>
Sui Jingfeng Sept. 7, 2023, 4:33 p.m. UTC | #37
Hi,


On 2023/9/7 17:08, Christian König wrote:


> I strongly suggest that you just completely drop this here 


Drop this is OK, no problem. Then I will go to develop something else.
This version is not intended to merge originally, as it's a RFC.
Also, the core mechanism already finished, it is the first patch in this series.
Things left are just policy (how to specify one and parse the kernel CMD line) and nothing interesting left.
It is actually to fulfill my promise at V3 which is to give some examples as usage cases.


> and go into the AST driver and try to fix it. 

Well, someone tell me that this is well defined behavior yesterday,
which imply that it is not a bug. I'm not going to fix a non-bug.
But if thomas ask me to fix it, then I probably have to try to fix.
But I suggest if things not broken, don't fix it. Otherwise this may
incur more big trouble. For server's single display use case, it is
good enough.


Thanks.
Christian König Sept. 8, 2023, 6:59 a.m. UTC | #38
Am 07.09.23 um 18:33 schrieb suijingfeng:
> Hi,
>
>
> On 2023/9/7 17:08, Christian König wrote:
>
>
>> I strongly suggest that you just completely drop this here 
>
>
> Drop this is OK, no problem. Then I will go to develop something else.
> This version is not intended to merge originally, as it's a RFC.
> Also, the core mechanism already finished, it is the first patch in 
> this series.
> Things left are just policy (how to specify one and parse the kernel 
> CMD line) and nothing interesting left.
> It is actually to fulfill my promise at V3 which is to give some 
> examples as usage cases.
>
>
>> and go into the AST driver and try to fix it. 
>
> Well, someone tell me that this is well defined behavior yesterday,
> which imply that it is not a bug. I'm not going to fix a non-bug.

Sorry for that, I wasn't realizing what you are actually trying to do.

> But if thomas ask me to fix it, then I probably have to try to fix.
> But I suggest if things not broken, don't fix it. Otherwise this may
> incur more big trouble. For server's single display use case, it is
> good enough.

Yeah, exactly that's the reason why you shouldn't mess with this.

In theory you could try to re-program the necessary north bridge blocks 
to make integrated graphics work even if you installed a dedicated VGA 
adapter, but you will most likely be missing something.

The only real fix is to tell the BIOS that you want to use the 
integrated VGA device even if a dedicated one is detected.

If you want to learn more about the background AMD has a bunch of 
documentation around this on their website: 
https://www.amd.com/en/search/documentation/hub.html

The most interesting document for you is probably the BIOS programming 
manual, but don't ask me what exactly the title of that one. @Alex do 
you remember what that was called?

IIRC Intel had similar documentations public, but I don't know where to 
find those of hand.

Regards,
Christian.

>
>
> Thanks.
>