Message ID | 20230904195724.633404-1-sui.jingfeng@linux.dev (mailing list archive) |
---|---|
Headers | show |
Series | PCI/VGA: Allowing the user to select the primary video adapter at boot time | expand |
On Tue, 05 Sep 2023, Sui Jingfeng <sui.jingfeng@linux.dev> wrote: > From: Sui Jingfeng <suijingfeng@loongson.cn> > > On a machine with multiple GPUs, a Linux user has no control over which > one is primary at boot time. This series tries to solve above mentioned > problem by introduced the ->be_primary() function stub. The specific > device drivers can provide an implementation to hook up with this stub by > calling the vga_client_register() function. > > Once the driver bound the device successfully, VGAARB will call back to > the device driver. To query if the device drivers want to be primary or > not. Device drivers can just pass NULL if have no such needs. > > Please note that: > > 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would > like to mount at least three video cards. > > 2) Typically, those non-86 machines don't have a good UEFI firmware > support, which doesn't support select primary GPU as firmware stage. > Even on x86, there are old UEFI firmwares which already made undesired > decision for you. > > 3) This series is attempt to solve the remain problems at the driver level, > while another series[1] of me is target to solve the majority of the > problems at device level. > > Tested (limited) on x86 with four video card mounted, Intel UHD Graphics > 630 is the default boot VGA, successfully override by ast2400 with > ast.modeset=10 append at the kernel cmd line. The value 10 is incredibly arbitrary, and multiplied as a magic number all over the place. > $ lspci | grep VGA > > 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XTX [Radeon HD 8490 / R5 235X OEM] > 04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30) > 05:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 720] (rev a1) In this example, all of the GPUs are driven by different drivers. What good does a module parameter do if you have multiple GPUs of the same model, all driven by the same driver module? BR, Jani. > > $ sudo dmesg | grep vgaarb > > pci 0000:00:02.0: vgaarb: setting as boot VGA device > pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none > pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none > pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none > pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none > vgaarb: loaded > ast 0000:04:00.0: vgaarb: Override as primary by driver > i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem > radeon 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none > ast 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none > > v2: > * Add a simple implemment for drm/i915 and drm/ast > * Pick up all tags (Mario) > v3: > * Fix a mistake for drm/i915 implement > * Fix patch can not be applied problem because of merge conflect. > v4: > * Focus on solve the real problem. > > v1,v2 at https://patchwork.freedesktop.org/series/120059/ > v3 at https://patchwork.freedesktop.org/series/120562/ > > [1] https://patchwork.freedesktop.org/series/122845/ > > Sui Jingfeng (9): > PCI/VGA: Allowing the user to select the primary video adapter at boot > time > drm/nouveau: Implement .be_primary() callback > drm/radeon: Implement .be_primary() callback > drm/amdgpu: Implement .be_primary() callback > drm/i915: Implement .be_primary() callback > drm/loongson: Implement .be_primary() callback > drm/ast: Register as a VGA client by calling vga_client_register() > drm/hibmc: Register as a VGA client by calling vga_client_register() > drm/gma500: Register as a VGA client by calling vga_client_register() > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 13 ++++- > drivers/gpu/drm/ast/ast_drv.c | 31 ++++++++++ > drivers/gpu/drm/gma500/psb_drv.c | 57 ++++++++++++++++++- > .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c | 15 +++++ > drivers/gpu/drm/i915/display/intel_vga.c | 15 ++++- > drivers/gpu/drm/loongson/loongson_module.c | 2 +- > drivers/gpu/drm/loongson/loongson_module.h | 1 + > drivers/gpu/drm/loongson/lsdc_drv.c | 10 +++- > drivers/gpu/drm/nouveau/nouveau_vga.c | 11 +++- > drivers/gpu/drm/radeon/radeon_device.c | 10 +++- > drivers/pci/vgaarb.c | 43 ++++++++++++-- > drivers/vfio/pci/vfio_pci_core.c | 2 +- > include/linux/vgaarb.h | 8 ++- > 14 files changed, 210 insertions(+), 19 deletions(-)
Hi Am 04.09.23 um 21:57 schrieb Sui Jingfeng: > From: Sui Jingfeng <suijingfeng@loongson.cn> > > On a machine with multiple GPUs, a Linux user has no control over which > one is primary at boot time. This series tries to solve above mentioned If anything, the primary graphics adapter is the one initialized by the firmware. I think our boot-up graphics also make this assumption implicitly. But what's the use case for overriding this setting? Best regards Thomas > problem by introduced the ->be_primary() function stub. The specific > device drivers can provide an implementation to hook up with this stub by > calling the vga_client_register() function. > > Once the driver bound the device successfully, VGAARB will call back to > the device driver. To query if the device drivers want to be primary or > not. Device drivers can just pass NULL if have no such needs. > > Please note that: > > 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would > like to mount at least three video cards. > > 2) Typically, those non-86 machines don't have a good UEFI firmware > support, which doesn't support select primary GPU as firmware stage. > Even on x86, there are old UEFI firmwares which already made undesired > decision for you. > > 3) This series is attempt to solve the remain problems at the driver level, > while another series[1] of me is target to solve the majority of the > problems at device level. > > Tested (limited) on x86 with four video card mounted, Intel UHD Graphics > 630 is the default boot VGA, successfully override by ast2400 with > ast.modeset=10 append at the kernel cmd line. > > $ lspci | grep VGA > > 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XTX [Radeon HD 8490 / R5 235X OEM] > 04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30) > 05:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 720] (rev a1) > > $ sudo dmesg | grep vgaarb > > pci 0000:00:02.0: vgaarb: setting as boot VGA device > pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none > pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none > pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none > pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none > vgaarb: loaded > ast 0000:04:00.0: vgaarb: Override as primary by driver > i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem > radeon 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none > ast 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none > > v2: > * Add a simple implemment for drm/i915 and drm/ast > * Pick up all tags (Mario) > v3: > * Fix a mistake for drm/i915 implement > * Fix patch can not be applied problem because of merge conflect. > v4: > * Focus on solve the real problem. > > v1,v2 at https://patchwork.freedesktop.org/series/120059/ > v3 at https://patchwork.freedesktop.org/series/120562/ > > [1] https://patchwork.freedesktop.org/series/122845/ > > Sui Jingfeng (9): > PCI/VGA: Allowing the user to select the primary video adapter at boot > time > drm/nouveau: Implement .be_primary() callback > drm/radeon: Implement .be_primary() callback > drm/amdgpu: Implement .be_primary() callback > drm/i915: Implement .be_primary() callback > drm/loongson: Implement .be_primary() callback > drm/ast: Register as a VGA client by calling vga_client_register() > drm/hibmc: Register as a VGA client by calling vga_client_register() > drm/gma500: Register as a VGA client by calling vga_client_register() > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 13 ++++- > drivers/gpu/drm/ast/ast_drv.c | 31 ++++++++++ > drivers/gpu/drm/gma500/psb_drv.c | 57 ++++++++++++++++++- > .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c | 15 +++++ > drivers/gpu/drm/i915/display/intel_vga.c | 15 ++++- > drivers/gpu/drm/loongson/loongson_module.c | 2 +- > drivers/gpu/drm/loongson/loongson_module.h | 1 + > drivers/gpu/drm/loongson/lsdc_drv.c | 10 +++- > drivers/gpu/drm/nouveau/nouveau_vga.c | 11 +++- > drivers/gpu/drm/radeon/radeon_device.c | 10 +++- > drivers/pci/vgaarb.c | 43 ++++++++++++-- > drivers/vfio/pci/vfio_pci_core.c | 2 +- > include/linux/vgaarb.h | 8 ++- > 14 files changed, 210 insertions(+), 19 deletions(-) >
Hi Am 04.09.23 um 21:57 schrieb Sui Jingfeng: > From: Sui Jingfeng <suijingfeng@loongson.cn> > > On a machine with multiple GPUs, a Linux user has no control over which > one is primary at boot time. This series tries to solve above mentioned > problem by introduced the ->be_primary() function stub. The specific > device drivers can provide an implementation to hook up with this stub by > calling the vga_client_register() function. > > Once the driver bound the device successfully, VGAARB will call back to > the device driver. To query if the device drivers want to be primary or > not. Device drivers can just pass NULL if have no such needs. > > Please note that: > > 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would > like to mount at least three video cards. > > 2) Typically, those non-86 machines don't have a good UEFI firmware > support, which doesn't support select primary GPU as firmware stage. > Even on x86, there are old UEFI firmwares which already made undesired > decision for you. > > 3) This series is attempt to solve the remain problems at the driver level, > while another series[1] of me is target to solve the majority of the > problems at device level. > > Tested (limited) on x86 with four video card mounted, Intel UHD Graphics > 630 is the default boot VGA, successfully override by ast2400 with > ast.modeset=10 append at the kernel cmd line. FYI: per-driver modeset parameters are deprecated and not to be used. Please don't promote them. You can use modprobe.blacklist or initcall_blacklist on the kernel command line. Best regards Thomas > > $ lspci | grep VGA > > 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XTX [Radeon HD 8490 / R5 235X OEM] > 04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30) > 05:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 720] (rev a1) > > $ sudo dmesg | grep vgaarb > > pci 0000:00:02.0: vgaarb: setting as boot VGA device > pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none > pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none > pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none > pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none > vgaarb: loaded > ast 0000:04:00.0: vgaarb: Override as primary by driver > i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem > radeon 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none > ast 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none > > v2: > * Add a simple implemment for drm/i915 and drm/ast > * Pick up all tags (Mario) > v3: > * Fix a mistake for drm/i915 implement > * Fix patch can not be applied problem because of merge conflect. > v4: > * Focus on solve the real problem. > > v1,v2 at https://patchwork.freedesktop.org/series/120059/ > v3 at https://patchwork.freedesktop.org/series/120562/ > > [1] https://patchwork.freedesktop.org/series/122845/ > > Sui Jingfeng (9): > PCI/VGA: Allowing the user to select the primary video adapter at boot > time > drm/nouveau: Implement .be_primary() callback > drm/radeon: Implement .be_primary() callback > drm/amdgpu: Implement .be_primary() callback > drm/i915: Implement .be_primary() callback > drm/loongson: Implement .be_primary() callback > drm/ast: Register as a VGA client by calling vga_client_register() > drm/hibmc: Register as a VGA client by calling vga_client_register() > drm/gma500: Register as a VGA client by calling vga_client_register() > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 13 ++++- > drivers/gpu/drm/ast/ast_drv.c | 31 ++++++++++ > drivers/gpu/drm/gma500/psb_drv.c | 57 ++++++++++++++++++- > .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c | 15 +++++ > drivers/gpu/drm/i915/display/intel_vga.c | 15 ++++- > drivers/gpu/drm/loongson/loongson_module.c | 2 +- > drivers/gpu/drm/loongson/loongson_module.h | 1 + > drivers/gpu/drm/loongson/lsdc_drv.c | 10 +++- > drivers/gpu/drm/nouveau/nouveau_vga.c | 11 +++- > drivers/gpu/drm/radeon/radeon_device.c | 10 +++- > drivers/pci/vgaarb.c | 43 ++++++++++++-- > drivers/vfio/pci/vfio_pci_core.c | 2 +- > include/linux/vgaarb.h | 8 ++- > 14 files changed, 210 insertions(+), 19 deletions(-) >
Am 05.09.23 um 12:38 schrieb Jani Nikula: > On Tue, 05 Sep 2023, Sui Jingfeng <sui.jingfeng@linux.dev> wrote: >> From: Sui Jingfeng <suijingfeng@loongson.cn> >> >> On a machine with multiple GPUs, a Linux user has no control over which >> one is primary at boot time. This series tries to solve above mentioned >> problem by introduced the ->be_primary() function stub. The specific >> device drivers can provide an implementation to hook up with this stub by >> calling the vga_client_register() function. >> >> Once the driver bound the device successfully, VGAARB will call back to >> the device driver. To query if the device drivers want to be primary or >> not. Device drivers can just pass NULL if have no such needs. >> >> Please note that: >> >> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would >> like to mount at least three video cards. Well, you rarely find a board which can actually handle a single one :) >> >> 2) Typically, those non-86 machines don't have a good UEFI firmware >> support, which doesn't support select primary GPU as firmware stage. >> Even on x86, there are old UEFI firmwares which already made undesired >> decision for you. >> >> 3) This series is attempt to solve the remain problems at the driver level, >> while another series[1] of me is target to solve the majority of the >> problems at device level. >> >> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics >> 630 is the default boot VGA, successfully override by ast2400 with >> ast.modeset=10 append at the kernel cmd line. > The value 10 is incredibly arbitrary, and multiplied as a magic number > all over the place. +1 > >> $ lspci | grep VGA >> >> 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] >> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XTX [Radeon HD 8490 / R5 235X OEM] >> 04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30) >> 05:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 720] (rev a1) > In this example, all of the GPUs are driven by different drivers. What > good does a module parameter do if you have multiple GPUs of the same > model, all driven by the same driver module? Completely agree. Question is what is the benefit for the end user to actually specify this? If you want the initial console on a different device than implement a kernel options for vgaarb and *not* the drivers. Regards, Christian. > > BR, > Jani. > >> $ sudo dmesg | grep vgaarb >> >> pci 0000:00:02.0: vgaarb: setting as boot VGA device >> pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none >> pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none >> pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none >> pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none >> vgaarb: loaded >> ast 0000:04:00.0: vgaarb: Override as primary by driver >> i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem >> radeon 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none >> ast 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none >> >> v2: >> * Add a simple implemment for drm/i915 and drm/ast >> * Pick up all tags (Mario) >> v3: >> * Fix a mistake for drm/i915 implement >> * Fix patch can not be applied problem because of merge conflect. >> v4: >> * Focus on solve the real problem. >> >> v1,v2 at https://patchwork.freedesktop.org/series/120059/ >> v3 at https://patchwork.freedesktop.org/series/120562/ >> >> [1] https://patchwork.freedesktop.org/series/122845/ >> >> Sui Jingfeng (9): >> PCI/VGA: Allowing the user to select the primary video adapter at boot >> time >> drm/nouveau: Implement .be_primary() callback >> drm/radeon: Implement .be_primary() callback >> drm/amdgpu: Implement .be_primary() callback >> drm/i915: Implement .be_primary() callback >> drm/loongson: Implement .be_primary() callback >> drm/ast: Register as a VGA client by calling vga_client_register() >> drm/hibmc: Register as a VGA client by calling vga_client_register() >> drm/gma500: Register as a VGA client by calling vga_client_register() >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 13 ++++- >> drivers/gpu/drm/ast/ast_drv.c | 31 ++++++++++ >> drivers/gpu/drm/gma500/psb_drv.c | 57 ++++++++++++++++++- >> .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c | 15 +++++ >> drivers/gpu/drm/i915/display/intel_vga.c | 15 ++++- >> drivers/gpu/drm/loongson/loongson_module.c | 2 +- >> drivers/gpu/drm/loongson/loongson_module.h | 1 + >> drivers/gpu/drm/loongson/lsdc_drv.c | 10 +++- >> drivers/gpu/drm/nouveau/nouveau_vga.c | 11 +++- >> drivers/gpu/drm/radeon/radeon_device.c | 10 +++- >> drivers/pci/vgaarb.c | 43 ++++++++++++-- >> drivers/vfio/pci/vfio_pci_core.c | 2 +- >> include/linux/vgaarb.h | 8 ++- >> 14 files changed, 210 insertions(+), 19 deletions(-)
Hi, On 2023/9/5 18:45, Thomas Zimmermann wrote: > Hi > > Am 04.09.23 um 21:57 schrieb Sui Jingfeng: >> From: Sui Jingfeng <suijingfeng@loongson.cn> >> >> On a machine with multiple GPUs, a Linux user has no control over which >> one is primary at boot time. This series tries to solve above mentioned > > If anything, the primary graphics adapter is the one initialized by > the firmware. I think our boot-up graphics also make this assumption > implicitly. > Yes, but by the time of DRM drivers get loaded successfully,the boot-up graphics already finished. Firmware framebuffer device already get killed by the drm_aperture_remove_conflicting_pci_framebuffers() function (or its siblings). So, this series is definitely not to interact with the firmware framebuffer (or more intelligent framebuffer drivers). It is for user space program, such as X server and Wayland compositor. Its for Linux user or drm drivers testers, which allow them to direct graphic display server using right hardware of interested as primary video card. Also, I believe that X server and Wayland compositor are the best test examples. If a specific DRM driver can't work with X server as a primary, then there probably have something wrong. > But what's the use case for overriding this setting? > On a specific machine with multiple GPUs mounted, only the primary graphics get POST-ed (initialized) by the firmware. Therefore, the DRM drivers for the rest video cards, have to choose to work without the prerequisite setups done by firmware, This is called as POST. One of the use cases of this series is to test if a specific DRM driver could works properly, even though there is no prerequisite works have been done by firmware at all. And it seems that the results is not satisfying in all cases. drm/ast is the first drm drivers which refused to work if not being POST-ed by the firmware. Before apply this series, I was unable make drm/ast as the primary video card easily. On a multiple video card configuration, the monitor connected with the AST2400 not light up. While confusing, a naive programmer may suspect the PRIME is not working. After applied this series and passing ast.modeset=10 on the kernel cmd line, I found that the monitor connected with my ast2400 video card still black, It doesn't display and doesn't show image to me. While in the process of study drm/ast, I know that drm/ast driver has the POST code shipped. See the ast_post_gpu() function, then, I was wondering why this function doesn't works. After a short-time (hasty) debugging, I found that the the ast_post_gpu() function didn't get run. Because it have something to do with the ast->config_mode. Without thinking too much, I hardcoded the ast->config_mode as ast_use_p2a to force the ast_post_gpu() function get run. ``` --- a/drivers/gpu/drm/ast/ast_main.c +++ b/drivers/gpu/drm/ast/ast_main.c @@ -132,6 +132,8 @@ static int ast_device_config_init(struct ast_device *ast) } } + ast->config_mode = ast_use_p2a; + switch (ast->config_mode) { case ast_use_defaults: drm_info(dev, "Using default configuration\n"); ``` Then, the monitor light up, it display the Ubuntu greeter to me. Therefore, my patch is helpful, at lease for the Linux drm driver tester and developer. It allow programmers to test the specific part of the specific drive without changing a line of the source code and without the need of sudo authority. It helps to improve efficiency of the testing and patch verification. I know the PrimaryGPU option of Xorg conf, but this approach will remember the setup have been made, you need modify it with root authority each time you want to switch the primary. But on rapid developing and/or testing multiple video drivers, with only one computer hardware resource available. What we really want probably is a one-shoot command as this series provide. So, this is the first use case. This probably also help to test full modeset, PRIME and reverse PRIME on multiple video card machine. > Best regards > Thomas >
Hi, On 2023/9/5 21:28, Christian König wrote: >>> >>> 2) Typically, those non-86 machines don't have a good UEFI firmware >>> support, which doesn't support select primary GPU as firmware >>> stage. >>> Even on x86, there are old UEFI firmwares which already made >>> undesired >>> decision for you. >>> >>> 3) This series is attempt to solve the remain problems at the driver >>> level, >>> while another series[1] of me is target to solve the majority of >>> the >>> problems at device level. >>> >>> Tested (limited) on x86 with four video card mounted, Intel UHD >>> Graphics >>> 630 is the default boot VGA, successfully override by ast2400 with >>> ast.modeset=10 append at the kernel cmd line. >> The value 10 is incredibly arbitrary, and multiplied as a magic number >> all over the place. > > +1 This is the exact reason why I made this series as RFC, because this is a open-ended problem. The choices of 3,4,5,6,7,8 and 9 are as arbitrary as the number of '10'. '1' and '2' is definitely not suitable, because the seat has already been taken. Take the drm/nouveau as an example: ``` MODULE_PARM_DESC(modeset, "enable driver (default: auto, " "0 = disabled, 1 = enabled, 2 = headless)"); int nouveau_modeset = -1; module_param_named(modeset, nouveau_modeset, int, 0400); ``` '1' is for enable the drm driver, some driver even override the 'nomodeset' parameter. '2' is not suitable, because nouveau use it as headless GPU (render-only or compute class GPU?) '3' is also not likely the best, the concerns is that what if a specific drm driver want to expand the usage in the future? The reason I pick up the digit '10' is that 1) The modeset parameter is unlikely to get expanded up to 10 usages. Other drm drivers only use the '-1', '0' and 1, choose '2' will conflict with drm/nouveau. By pick the digit '10', it leave some space(room) to various device driver authors. It also helps to keep the usage consistent across various drivers. 2) An int taken up 4 byte, I don't want to waste even a single byte, While in the process of defencing my patch, I have to say draft another kernel command line would cause the wasting of precious RAM storage. An int can have 2^31 usage, why we can't improve the utilization rate? 3) Please consider the fact that the modeset is the most common and attractive parameter No name is better than the 'modeset', as other name is not easy to remember. Again, this is for Linux user, thus it is not arbitrary. Despite simple and trivial, I think about it more than one week.
On Tue, 5 Sep 2023 03:57:15 +0800 Sui Jingfeng <sui.jingfeng@linux.dev> wrote: > From: Sui Jingfeng <suijingfeng@loongson.cn> > > On a machine with multiple GPUs, a Linux user has no control over which > one is primary at boot time. This series tries to solve above mentioned > problem by introduced the ->be_primary() function stub. The specific > device drivers can provide an implementation to hook up with this stub by > calling the vga_client_register() function. > > Once the driver bound the device successfully, VGAARB will call back to > the device driver. To query if the device drivers want to be primary or > not. Device drivers can just pass NULL if have no such needs. > > Please note that: > > 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would > like to mount at least three video cards. > > 2) Typically, those non-86 machines don't have a good UEFI firmware > support, which doesn't support select primary GPU as firmware stage. > Even on x86, there are old UEFI firmwares which already made undesired > decision for you. > > 3) This series is attempt to solve the remain problems at the driver level, > while another series[1] of me is target to solve the majority of the > problems at device level. > > Tested (limited) on x86 with four video card mounted, Intel UHD Graphics > 630 is the default boot VGA, successfully override by ast2400 with > ast.modeset=10 append at the kernel cmd line. > > $ lspci | grep VGA > > 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] In all my previous experiments with VGA routing and IGD I found that IGD can't actually release VGA routing and Intel confirmed the hardware doesn't have the ability to do so. It will always be primary from a VGA routing perspective. Was this actually tested with non-UEFI? I suspect it might only work in UEFI mode where we probably don't actually have a dependency on VGA routing. This is essentially why vfio requires UEFI ROMs when assigning GPUs to VMs, VGA routing is too broken to use on Intel systems with IGD. Thanks, Alex > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XTX [Radeon HD 8490 / R5 235X OEM] > 04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30) > 05:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 720] (rev a1) > > $ sudo dmesg | grep vgaarb > > pci 0000:00:02.0: vgaarb: setting as boot VGA device > pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none > pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none > pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none > pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none > vgaarb: loaded > ast 0000:04:00.0: vgaarb: Override as primary by driver > i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem > radeon 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none > ast 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none > > v2: > * Add a simple implemment for drm/i915 and drm/ast > * Pick up all tags (Mario) > v3: > * Fix a mistake for drm/i915 implement > * Fix patch can not be applied problem because of merge conflect. > v4: > * Focus on solve the real problem. > > v1,v2 at https://patchwork.freedesktop.org/series/120059/ > v3 at https://patchwork.freedesktop.org/series/120562/ > > [1] https://patchwork.freedesktop.org/series/122845/ > > Sui Jingfeng (9): > PCI/VGA: Allowing the user to select the primary video adapter at boot > time > drm/nouveau: Implement .be_primary() callback > drm/radeon: Implement .be_primary() callback > drm/amdgpu: Implement .be_primary() callback > drm/i915: Implement .be_primary() callback > drm/loongson: Implement .be_primary() callback > drm/ast: Register as a VGA client by calling vga_client_register() > drm/hibmc: Register as a VGA client by calling vga_client_register() > drm/gma500: Register as a VGA client by calling vga_client_register() > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 13 ++++- > drivers/gpu/drm/ast/ast_drv.c | 31 ++++++++++ > drivers/gpu/drm/gma500/psb_drv.c | 57 ++++++++++++++++++- > .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c | 15 +++++ > drivers/gpu/drm/i915/display/intel_vga.c | 15 ++++- > drivers/gpu/drm/loongson/loongson_module.c | 2 +- > drivers/gpu/drm/loongson/loongson_module.h | 1 + > drivers/gpu/drm/loongson/lsdc_drv.c | 10 +++- > drivers/gpu/drm/nouveau/nouveau_vga.c | 11 +++- > drivers/gpu/drm/radeon/radeon_device.c | 10 +++- > drivers/pci/vgaarb.c | 43 ++++++++++++-- > drivers/vfio/pci/vfio_pci_core.c | 2 +- > include/linux/vgaarb.h | 8 ++- > 14 files changed, 210 insertions(+), 19 deletions(-) >
Hi Am 05.09.23 um 15:30 schrieb suijingfeng: > Hi, > > > On 2023/9/5 18:45, Thomas Zimmermann wrote: >> Hi >> >> Am 04.09.23 um 21:57 schrieb Sui Jingfeng: >>> From: Sui Jingfeng <suijingfeng@loongson.cn> >>> >>> On a machine with multiple GPUs, a Linux user has no control over which >>> one is primary at boot time. This series tries to solve above mentioned >> >> If anything, the primary graphics adapter is the one initialized by >> the firmware. I think our boot-up graphics also make this assumption >> implicitly. >> > > Yes, but by the time of DRM drivers get loaded successfully,the boot-up > graphics already finished. > Firmware framebuffer device already get killed by the > drm_aperture_remove_conflicting_pci_framebuffers() > function (or its siblings). So, this series is definitely not to > interact with the firmware framebuffer Yes and no. The helpers you mention will attempt to remove the firmware framebuffer on the given PCI device. If you have multiple PCI devices, the other devices would not be affected. This also means that probing a non-primary card will not affect the firmware framebuffer on the primary card. You can have all these drivers co-exist next to each other. If you link a full DRM driver into the kernel image, it might even be loaded before the firmware-framebuffer's driver. We had some funny bugs from these interactions. > (or more intelligent framebuffer drivers). It is for user space > program, such as X server and Wayland > compositor. Its for Linux user or drm drivers testers, which allow them > to direct graphic display server > using right hardware of interested as primary video card. > > Also, I believe that X server and Wayland compositor are the best test > examples. > If a specific DRM driver can't work with X server as a primary, > then there probably have something wrong. If you want to run a userspace compositor or X11 on a certain device, you best configure this in the program's config files. But not on the kernel command line. The whole concept of a 'primary' display is bogus IMHO. It only exists because old VGA and BIOS (and their equivalents on non-PC systems) were unable to use more than one graphics device. Hence, as you write below, only the first device got POSTed by the BIOS. If you had an additional card, the device driver needed to perform the POSTing. However, on modern Linux systems the primary display does not really exist. 'Primary' is the device that is available via VGA, VESA or EFI. Our drivers don't use these interfaces, but the native registers. As you said yourself, these firmware devices (VGA, VESA, EFI) are removed ASAP by the native drivers. > > >> But what's the use case for overriding this setting? >> > > On a specific machine with multiple GPUs mounted, > only the primary graphics get POST-ed (initialized) by the firmware. > Therefore, the DRM drivers for the rest video cards, have to choose to > work without the prerequisite setups done by firmware, This is called as > POST. > > One of the use cases of this series is to test if a specific DRM driver > could works properly, > even though there is no prerequisite works have been done by firmware at > all. > And it seems that the results is not satisfying in all cases. > > drm/ast is the first drm drivers which refused to work if not being > POST-ed by the firmware. You might have found a bug in the ast driver. Ast has means to detect if the device has been POSTed and maybe do that. If this doesn't work correctly, it needs a fix. As Christian mentioned, if anything, you might add an option to specify the default card to vgaarb (e.g., as PCI slot). But userspace should avoid the idea of a primary card IMHO. Best regards Thomas > > Before apply this series, I was unable make drm/ast as the primary video > card easily. On a > multiple video card configuration, the monitor connected with the > AST2400 not light up. > While confusing, a naive programmer may suspect the PRIME is not working. > > After applied this series and passing ast.modeset=10 on the kernel cmd > line, > I found that the monitor connected with my ast2400 video card still black, > It doesn't display and doesn't show image to me. > > While in the process of study drm/ast, I know that drm/ast driver has > the POST code shipped. > See the ast_post_gpu() function, then, I was wondering why this function > doesn't works. > After a short-time (hasty) debugging, I found that the the > ast_post_gpu() function > didn't get run. Because it have something to do with the ast->config_mode. > > Without thinking too much, I hardcoded the ast->config_mode as > ast_use_p2a to > force the ast_post_gpu() function get run. > > ``` > > --- a/drivers/gpu/drm/ast/ast_main.c > +++ b/drivers/gpu/drm/ast/ast_main.c > @@ -132,6 +132,8 @@ static int ast_device_config_init(struct ast_device > *ast) > } > } > > + ast->config_mode = ast_use_p2a; > + > switch (ast->config_mode) { > case ast_use_defaults: > drm_info(dev, "Using default configuration\n"); > > ``` > > Then, the monitor light up, it display the Ubuntu greeter to me. > Therefore, my patch is helpful, at lease for the Linux drm driver tester > and developer. > It allow programmers to test the specific part of the specific drive > without changing a line of the source code and without the need of sudo > authority. > It helps to improve efficiency of the testing and patch verification. > > I know the PrimaryGPU option of Xorg conf, but this approach will > remember the setup > have been made, you need modify it with root authority each time you > want to switch > the primary. But on rapid developing and/or testing multiple video > drivers, with > only one computer hardware resource available. What we really want > probably is a > one-shoot command as this series provide. > > So, this is the first use case. This probably also help to test full > modeset, > PRIME and reverse PRIME on multiple video card machine. > > >> Best regards >> Thomas >> >
On 2023/9/5 18:49, Thomas Zimmermann wrote: > Hi > > Am 04.09.23 um 21:57 schrieb Sui Jingfeng: >> From: Sui Jingfeng <suijingfeng@loongson.cn> >> >> On a machine with multiple GPUs, a Linux user has no control over which >> one is primary at boot time. This series tries to solve above mentioned >> problem by introduced the ->be_primary() function stub. The specific >> device drivers can provide an implementation to hook up with this >> stub by >> calling the vga_client_register() function. >> >> Once the driver bound the device successfully, VGAARB will call back to >> the device driver. To query if the device drivers want to be primary or >> not. Device drivers can just pass NULL if have no such needs. >> >> Please note that: >> >> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would >> like to mount at least three video cards. >> >> 2) Typically, those non-86 machines don't have a good UEFI firmware >> support, which doesn't support select primary GPU as firmware stage. >> Even on x86, there are old UEFI firmwares which already made >> undesired >> decision for you. >> >> 3) This series is attempt to solve the remain problems at the driver >> level, >> while another series[1] of me is target to solve the majority of the >> problems at device level. >> >> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics >> 630 is the default boot VGA, successfully override by ast2400 with >> ast.modeset=10 append at the kernel cmd line. > > FYI: per-driver modeset parameters are deprecated and not to be used. > Please don't promote them. Well, please wait, I want to explain. drm/nouveau already promote it a little bit. Despite no code of conduct or specification guiding how the modules parameters should be. Noticed that there already have a lot of DRM drivers support the modeset parameters, for the modeset parameter, authors of various device driver try to make the usage not conflict with others. I believe that this is good thing for Linux users. It is probably the responsibility of the drm core maintainers to force various drm drivers to reach a minimal consensus. Probably it pains to do so and doesn't pay off. But reach a minimal consensus do benefit to Linux users. > You can use modprobe.blacklist or initcall_blacklist on the kernel > command line. > There are some cases where the modprobe.blacklist doesn't works, I have come cross several time during the past. Because the device selected by the VGAARB is device-level thing, it is not the driver's problem. Sometimes when VGAARB has a bug, it will select a wrong device as primary. And the X server will use this wrong device as primary and completely crash there, due to lack a driver. Take my old S3 Graphics as an example: $ lspci | grep VGA 00:06.1 VGA compatible controller: Loongson Technology LLC DC (Display Controller) (rev 01) 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XT [Radeon HD 7470/8470 / R5 235/310 OEM] 07:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01) 08:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01) Before apply this patch: [ 0.361748] pci 0000:00:06.1: vgaarb: setting as boot VGA device [ 0.361753] pci 0000:00:06.1: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none [ 0.361765] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none [ 0.361773] pci 0000:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none [ 0.361779] pci 0000:08:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none [ 0.361781] vgaarb: loaded [ 0.367838] pci 0000:00:06.1: Overriding boot device as 1002:6778 [ 0.367841] pci 0000:00:06.1: Overriding boot device as 5333:9070 [ 0.367843] pci 0000:00:06.1: Overriding boot device as 5333:9070 For known reason, one of my system select the S3 Graphics as primary GPU. But this S3 Graphics not even have a decent drm upstream driver yet. Under such a case, I begin to believe that only the device who has a driver deserve the primary. Under such a condition, I want to reboot and enter the graphic environment with other working video cards. Either platform integrated and discrete GPU. This don't means I should compromise by un-mount the S3 graphics card from the motherboard, this also don't means that I should update my BIOS setting. As sometimes, the BIOS is more worse. With this series applied, all I need to do is to reboot the computer and pass a command line. By force override another video card (who has a decent driver support) as primary, I'm able to do the debugging under graphic environment. I would like to examine what's wrong with the vgaarb on a specific platform under X server graphic environment. Probably try compile a driver for this card and see it works, simply reboot without the need to change anything. It is so efficient. So this is probably the second usage of my patch. It hand the right of control back to the graphic developer.
Hi, On 2023/9/5 22:52, Alex Williamson wrote: > On Tue, 5 Sep 2023 03:57:15 +0800 > Sui Jingfeng <sui.jingfeng@linux.dev> wrote: > >> From: Sui Jingfeng <suijingfeng@loongson.cn> >> >> On a machine with multiple GPUs, a Linux user has no control over which >> one is primary at boot time. This series tries to solve above mentioned >> problem by introduced the ->be_primary() function stub. The specific >> device drivers can provide an implementation to hook up with this stub by >> calling the vga_client_register() function. >> >> Once the driver bound the device successfully, VGAARB will call back to >> the device driver. To query if the device drivers want to be primary or >> not. Device drivers can just pass NULL if have no such needs. >> >> Please note that: >> >> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would >> like to mount at least three video cards. >> >> 2) Typically, those non-86 machines don't have a good UEFI firmware >> support, which doesn't support select primary GPU as firmware stage. >> Even on x86, there are old UEFI firmwares which already made undesired >> decision for you. >> >> 3) This series is attempt to solve the remain problems at the driver level, >> while another series[1] of me is target to solve the majority of the >> problems at device level. >> >> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics >> 630 is the default boot VGA, successfully override by ast2400 with >> ast.modeset=10 append at the kernel cmd line. >> >> $ lspci | grep VGA >> >> 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] > In all my previous experiments with VGA routing and IGD I found that > IGD can't actually release VGA routing and Intel confirmed the hardware > doesn't have the ability to do so. It will always be primary from a > VGA routing perspective. Was this actually tested with non-UEFI? Yes, I have tested on my aspire e471 notebook (i5 5200U), because that notebook using legacy firmware (also have UEFI, double firmware). But this machine have difficult in install ubuntu under UEFI firmware in the past. So I keep it using the legacy firmware. It have two video card, IGD and nvidia video card(GFORCE 840M). nvidia call its video card as 3D controller (pci->class = 0x030200) I have tested this patch and another patch mention at [1] together. I can tell you that the firmware framebuffer of this notebook using vesafb, not efifb. And the framebuffer size (lfb.size) is very small. This is very strange, but I don't have enough time to look in details. But still works. I'm using and tesing my patch whenever and wherever possible. > I suspect it might only work in UEFI mode where we probably don't > actually have a dependency on VGA routing. This is essentially why > vfio requires UEFI ROMs when assigning GPUs to VMs, VGA routing is too > broken to use on Intel systems with IGD. Thanks, What you tell me here is the side effect come with the VGA-compatible, but I'm focus on the arbitration itself. I think there no need to keep the VGA routing hardware features nowadays except that hardware vendor want keep the backward compatibility and/or comply the PCI VGA compatible spec. > Alex >
On Wed, 6 Sep 2023 00:21:09 +0800 suijingfeng <suijingfeng@loongson.cn> wrote: > Hi, > > On 2023/9/5 22:52, Alex Williamson wrote: > > On Tue, 5 Sep 2023 03:57:15 +0800 > > Sui Jingfeng <sui.jingfeng@linux.dev> wrote: > > > >> From: Sui Jingfeng <suijingfeng@loongson.cn> > >> > >> On a machine with multiple GPUs, a Linux user has no control over which > >> one is primary at boot time. This series tries to solve above mentioned > >> problem by introduced the ->be_primary() function stub. The specific > >> device drivers can provide an implementation to hook up with this stub by > >> calling the vga_client_register() function. > >> > >> Once the driver bound the device successfully, VGAARB will call back to > >> the device driver. To query if the device drivers want to be primary or > >> not. Device drivers can just pass NULL if have no such needs. > >> > >> Please note that: > >> > >> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would > >> like to mount at least three video cards. > >> > >> 2) Typically, those non-86 machines don't have a good UEFI firmware > >> support, which doesn't support select primary GPU as firmware stage. > >> Even on x86, there are old UEFI firmwares which already made undesired > >> decision for you. > >> > >> 3) This series is attempt to solve the remain problems at the driver level, > >> while another series[1] of me is target to solve the majority of the > >> problems at device level. > >> > >> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics > >> 630 is the default boot VGA, successfully override by ast2400 with > >> ast.modeset=10 append at the kernel cmd line. > >> > >> $ lspci | grep VGA > >> > >> 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] > > In all my previous experiments with VGA routing and IGD I found that > > IGD can't actually release VGA routing and Intel confirmed the hardware > > doesn't have the ability to do so. It will always be primary from a > > VGA routing perspective. Was this actually tested with non-UEFI? > > Yes, I have tested on my aspire e471 notebook (i5 5200U), > because that notebook using legacy firmware (also have UEFI, double firmware). > But this machine have difficult in install ubuntu under UEFI firmware in the past. > So I keep it using the legacy firmware. > > It have two video card, IGD and nvidia video card(GFORCE 840M). > nvidia call its video card as 3D controller (pci->class = 0x030200) > > I have tested this patch and another patch mention at [1] together. > I can tell you that the firmware framebuffer of this notebook using vesafb, not efifb. > And the framebuffer size (lfb.size) is very small. This is very strange, > but I don't have enough time to look in details. But still works. > > I'm using and tesing my patch whenever and wherever possible. So you're testing VGA routing using a non-VGA 3D controller through the VESA address space? How does that test anything about VGA routing? > > I suspect it might only work in UEFI mode where we probably don't > > actually have a dependency on VGA routing. This is essentially why > > vfio requires UEFI ROMs when assigning GPUs to VMs, VGA routing is too > > broken to use on Intel systems with IGD. Thanks, > > > What you tell me here is the side effect come with the VGA-compatible, > but I'm focus on the arbitration itself. I think there no need to keep > the VGA routing hardware features nowadays except that hardware vendor > want keep the backward compatibility and/or comply the PCI VGA compatible spec. "VGA arbitration" is the mediation of VGA routing between devices, so I'm confused how you can be focused on the arbitration without the routing itself. Thanks, Alex
Hi, On 2023/9/5 23:05, Thomas Zimmermann wrote: > However, on modern Linux systems the primary display does not really > exist. No, it do exist. X server need to know which one is the primary GPU. The '*' character at the of (4@0:0:0) PCI device is the Primary. The '*' denote primary, see the log below. (II) xfree86: Adding drm device (/dev/dri/card2) (II) xfree86: Adding drm device (/dev/dri/card0) (II) Platform probe for /sys/devices/pci0000:00/0000:00:1c.5/0000:003:00.0/0000:04:00.0/drm/card0 (II) xfree86: Adding drm device (/dev/dri/card3) (II) Platform probe for /sys/devices/pci0000:00/0000:00:1c.6/0000:005:00.0/drm/card3 (--) PCI: (0@0:2:0) 8086:3e91:8086:3e91 rev 0, Mem @ 0xdb000000/167777216, 0xa0000000/536870912, I/O @ 0x0000f000/64, BIOS @ 0x????????/131072 (--) PCI: (1@0:0:0) 1002:6771:1043:8636 rev 0, Mem @ 0xc0000000/2688435456, 0xdf220000/131072, I/O @ 0x0000e000/256, BIOS @ 0x????????/131072 (--) PCI:*(4@0:0:0) 1a03:2000:1a03:2000 rev 48, Mem @ 0xde000000/166777216, 0xdf020000/131072, I/O @ 0x0000c000/128, BIOS @ 0x????????/131072 (--) PCI: (5@0:0:0) 10de:1288:174b:b324 rev 161, Mem @ 0xdc000000/116777216, 0xd0000000/134217728, 0xd8000000/33554432, I/O @ 0x0000b000/128, BIOS @@0x????????/524288 The modesetting driver of X server will create framebuffer on the primary video adapter. If a 2D video adapter (like the aspeed BMC) is not the primary, then it probably will not be used. The only chance to be able to display something is to functional as a output slave. But the output slave technology need the PRIME support for cross driver buffer sharing. So, there do have some difference between the primary and non-primary video adapters. > 'Primary' is the device that is available via VGA, VESA or EFI. Our > drivers don't use these interfaces, but the native registers. As you > said yourself, these firmware devices (VGA, VESA, EFI) are removed > ASAP by the native drivers.
On 2023/9/5 23:05, Thomas Zimmermann wrote: > Hi > > Am 05.09.23 um 15:30 schrieb suijingfeng: >> Hi, >> >> >> On 2023/9/5 18:45, Thomas Zimmermann wrote: >>> Hi >>> >>> Am 04.09.23 um 21:57 schrieb Sui Jingfeng: >>>> From: Sui Jingfeng <suijingfeng@loongson.cn> >>>> >>>> On a machine with multiple GPUs, a Linux user has no control over >>>> which >>>> one is primary at boot time. This series tries to solve above >>>> mentioned >>> >>> If anything, the primary graphics adapter is the one initialized by >>> the firmware. I think our boot-up graphics also make this assumption >>> implicitly. >>> >> >> Yes, but by the time of DRM drivers get loaded successfully,the >> boot-up graphics already finished. >> Firmware framebuffer device already get killed by the >> drm_aperture_remove_conflicting_pci_framebuffers() >> function (or its siblings). So, this series is definitely not to >> interact with the firmware framebuffer > > Yes and no. The helpers you mention will attempt to remove the > firmware framebuffer on the given PCI device. If you have multiple PCI > devices, the other devices would not be affected. > Yes and no. For the yes part: drm_aperture_remove_conflicting_pci_framebuffers() only kill the conflict one. But for a specific machine with the modern UEFI firmware, there should be only one firmware framebuffer driver. That shoudd be the EFIFB(UEFI GOP). I do have multiple PCI devices, but I don't understand when and why a system will have more than one firmware framebuffer. Even for the machines with the legacy BIOS, the fixed VGA aperture address range can only be owned by one firmware driver. It is just that we need to handle the routing, the ->set_decode() callback of vga_client_register() is used to do such work. Am I correct?
Hi, On 2023/9/5 23:05, Thomas Zimmermann wrote: > However, on modern Linux systems the primary display does not really > exist. 'Primary' is the device that is available via VGA, VESA or EFI. I may miss the point, what do you means by choose the word "modern"? Are you trying to tell me that X server is too old and Wayland is the modern display server? > Our drivers don't use these interfaces, but the native registers. Yes and no? Yes for the machine with the UEFI firmware, but I not sure if this statement is true for the machine with the legacy firmware. As the display controller in the ASpeed BMC is VGA compatible. Therefore, in theory, it should works with the VGA console on the machine with another VGA compatible video card. So the ast_vga_set_decode() function provided in the 0007 patch probably useful on legacy firmware environment. To be honest, I have tested this on various machine with UEFI firmware. But I didn't realized that I should do the testing on legacy firmware environment before sending this patch. It seems that the testing effort needed are quite exhausting, since all my machines come with the UEFI firmware. So is it OK to leave the legacy part to someone else who interested in it? Probably Alex is more professional at legacy VGA routing stuff? :-)
Hi, On 2023/9/5 22:52, Alex Williamson wrote: > On Tue, 5 Sep 2023 03:57:15 +0800 > Sui Jingfeng <sui.jingfeng@linux.dev> wrote: > >> From: Sui Jingfeng <suijingfeng@loongson.cn> >> >> On a machine with multiple GPUs, a Linux user has no control over which >> one is primary at boot time. This series tries to solve above mentioned >> problem by introduced the ->be_primary() function stub. The specific >> device drivers can provide an implementation to hook up with this stub by >> calling the vga_client_register() function. >> >> Once the driver bound the device successfully, VGAARB will call back to >> the device driver. To query if the device drivers want to be primary or >> not. Device drivers can just pass NULL if have no such needs. >> >> Please note that: >> >> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would >> like to mount at least three video cards. >> >> 2) Typically, those non-86 machines don't have a good UEFI firmware >> support, which doesn't support select primary GPU as firmware stage. >> Even on x86, there are old UEFI firmwares which already made undesired >> decision for you. >> >> 3) This series is attempt to solve the remain problems at the driver level, >> while another series[1] of me is target to solve the majority of the >> problems at device level. >> >> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics >> 630 is the default boot VGA, successfully override by ast2400 with >> ast.modeset=10 append at the kernel cmd line. >> >> $ lspci | grep VGA >> >> 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] > In all my previous experiments with VGA routing and IGD I found that > IGD can't actually release VGA routing and Intel confirmed the hardware > doesn't have the ability to do so. Which model of the IGD you are using? even for the IGD in Atom D2550, the legacy 128KB VGA memory range can be tuned to be mapped to IGD or to the DMI Interface. See the 1.7.3.2 section of the N2000 datasheet[1]. If a specific model of Intel has a bug in the VGA routing hardware logic unit, I would like to ignore it. Or switch to the UEFI firmware on such hardware. It is the hardware engineer's responsibility, I will not worry about it. Thanks for you tell this. [1] https://www.intel.com/content/dam/doc/datasheet/atom-d2000-n2000-vol-2-datasheet.pdf > It will always be primary from a > VGA routing perspective. Was this actually tested with non-UEFI? As you already said, the generous Intel already have confirmed that the hardware defect. So probably this is a good chance to switch to UEFI to solve the problem. Then, no testing for legacy is needed. > I suspect it might only work in UEFI mode where we probably don't > actually have a dependency on VGA routing. This is essentially why > vfio requires UEFI ROMs when assigning GPUs to VMs, VGA routing is too > broken to use on Intel systems with IGD. Thanks, Thanks for you tell me this. To be honest, I have only tested my patch on machines with UEFI firmware. Since UEFI because the main stream, but if this patch is really useful for majority machine, I'm satisfied. The results is not too bad. Thanks. > Alex >
Hi, On 2023/9/5 23:05, Thomas Zimmermann wrote: > You might have found a bug in the ast driver. Ast has means to detect > if the device has been POSTed and maybe do that. If this doesn't work > correctly, it needs a fix. > That sounds fine. The bug is not a big deal, I'm just take it as an example and report it to you. But a real fix can be complex, because there are quite a lot of servers ship with ASpeed BMC hardware. Honestly I don't have the time fix it on formal way. I have already tons patches in pending and I will focus on solve VGAARB related problem. Because I want to test your patch occasionally. So this series is useful for myself at corner cases.
Am 05.09.23 um 15:30 schrieb suijingfeng: > Hi, > > > On 2023/9/5 18:45, Thomas Zimmermann wrote: >> Hi >> >> Am 04.09.23 um 21:57 schrieb Sui Jingfeng: >>> From: Sui Jingfeng <suijingfeng@loongson.cn> >>> >>> On a machine with multiple GPUs, a Linux user has no control over which >>> one is primary at boot time. This series tries to solve above mentioned >> >> If anything, the primary graphics adapter is the one initialized by >> the firmware. I think our boot-up graphics also make this assumption >> implicitly. >> > > Yes, but by the time of DRM drivers get loaded successfully,the > boot-up graphics already finished. This is an incorrect assumption. drm_aperture_remove_conflicting_pci_framebuffers() and co don't kill the framebuffer, they just remove the current framebuffer driver to avoid further updates. So what happens (at least for amdgpu) is that we take over the framebuffer, including both mode and it's contents, and provide a new framebuffer interface until DRM masters like X or Wayland take over. > Firmware framebuffer device already get killed by the > drm_aperture_remove_conflicting_pci_framebuffers() > function (or its siblings). So, this series is definitely not to > interact with the firmware framebuffer > (or more intelligent framebuffer drivers). It is for user space > program, such as X server and Wayland > compositor. Its for Linux user or drm drivers testers, which allow > them to direct graphic display server > using right hardware of interested as primary video card. > > Also, I believe that X server and Wayland compositor are the best test > examples. > If a specific DRM driver can't work with X server as a primary, > then there probably have something wrong. > > >> But what's the use case for overriding this setting? >> > > On a specific machine with multiple GPUs mounted, > only the primary graphics get POST-ed (initialized) by the firmware. > Therefore, the DRM drivers for the rest video cards, have to choose to > work without the prerequisite setups done by firmware, This is called > as POST. Well, you don't seem to understand the background here. This is perfectly normal behavior. Secondary cards are posted after loading the appropriate DRM driver. At least for amdgpu this is done by calling the appropriate functions in the BIOS. > > One of the use cases of this series is to test if a specific DRM > driver could works properly, > even though there is no prerequisite works have been done by firmware > at all. > And it seems that the results is not satisfying in all cases. > > drm/ast is the first drm drivers which refused to work if not being > POST-ed by the firmware. As far as I know this is expected as well. AST is a relatively simple driver and when it's not the primary one during boot the assumption is that it isn't used at all. Regards, Christian. > > Before apply this series, I was unable make drm/ast as the primary > video card easily. On a > multiple video card configuration, the monitor connected with the > AST2400 not light up. > While confusing, a naive programmer may suspect the PRIME is not working. > > After applied this series and passing ast.modeset=10 on the kernel cmd > line, > I found that the monitor connected with my ast2400 video card still > black, > It doesn't display and doesn't show image to me. > > While in the process of study drm/ast, I know that drm/ast driver has > the POST code shipped. > See the ast_post_gpu() function, then, I was wondering why this > function doesn't works. > After a short-time (hasty) debugging, I found that the the > ast_post_gpu() function > didn't get run. Because it have something to do with the > ast->config_mode. > > Without thinking too much, I hardcoded the ast->config_mode as > ast_use_p2a to > force the ast_post_gpu() function get run. > > ``` > > --- a/drivers/gpu/drm/ast/ast_main.c > +++ b/drivers/gpu/drm/ast/ast_main.c > @@ -132,6 +132,8 @@ static int ast_device_config_init(struct > ast_device *ast) > } > } > > + ast->config_mode = ast_use_p2a; > + > switch (ast->config_mode) { > case ast_use_defaults: > drm_info(dev, "Using default configuration\n"); > > ``` > > Then, the monitor light up, it display the Ubuntu greeter to me. > Therefore, my patch is helpful, at lease for the Linux drm driver > tester and developer. > It allow programmers to test the specific part of the specific drive > without changing a line of the source code and without the need of > sudo authority. > It helps to improve efficiency of the testing and patch verification. > > I know the PrimaryGPU option of Xorg conf, but this approach will > remember the setup > have been made, you need modify it with root authority each time you > want to switch > the primary. But on rapid developing and/or testing multiple video > drivers, with > only one computer hardware resource available. What we really want > probably is a > one-shoot command as this series provide. > > So, this is the first use case. This probably also help to test full > modeset, > PRIME and reverse PRIME on multiple video card machine. > > >> Best regards >> Thomas >> >
Am 05.09.23 um 16:28 schrieb Sui Jingfeng: > Hi, > > On 2023/9/5 21:28, Christian König wrote: >>>> >>>> 2) Typically, those non-86 machines don't have a good UEFI firmware >>>> support, which doesn't support select primary GPU as firmware >>>> stage. >>>> Even on x86, there are old UEFI firmwares which already made >>>> undesired >>>> decision for you. >>>> >>>> 3) This series is attempt to solve the remain problems at the >>>> driver level, >>>> while another series[1] of me is target to solve the majority >>>> of the >>>> problems at device level. >>>> >>>> Tested (limited) on x86 with four video card mounted, Intel UHD >>>> Graphics >>>> 630 is the default boot VGA, successfully override by ast2400 with >>>> ast.modeset=10 append at the kernel cmd line. >>> The value 10 is incredibly arbitrary, and multiplied as a magic number >>> all over the place. >> >> +1 > > > This is the exact reason why I made this series as RFC, because this > is a open-ended problem. > The choices of 3,4,5,6,7,8 and 9 are as arbitrary as the number of > '10'. '1' and '2' is > definitely not suitable, because the seat has already been taken. Well you are completely missing the point. *DON'T* abuse the modeset module parameters for this! If you use 10 or any other value doesn't matter. Regards, Christian. > > Take the drm/nouveau as an example: > > > ``` > > MODULE_PARM_DESC(modeset, "enable driver (default: auto, " > "0 = disabled, 1 = enabled, 2 = headless)"); > int nouveau_modeset = -1; > module_param_named(modeset, nouveau_modeset, int, 0400); > > ``` > > > '1' is for enable the drm driver, some driver even override the > 'nomodeset' parameter. > > '2' is not suitable, because nouveau use it as headless GPU > (render-only or compute class GPU?) > > '3' is also not likely the best, the concerns is that > what if a specific drm driver want to expand the usage in the future? > > > The reason I pick up the digit '10' is that > > > 1) The modeset parameter is unlikely to get expanded up to 10 usages. > > Other drm drivers only use the '-1', '0' and 1, choose '2' will > conflict with drm/nouveau. > By pick the digit '10', it leave some space(room) to various device > driver authors. > It also helps to keep the usage consistent across various drivers. > > > 2) An int taken up 4 byte, I don't want to waste even a single byte, > > While in the process of defencing my patch, I have to say > draft another kernel command line would cause the wasting of precious > RAM storage. > > An int can have 2^31 usage, why we can't improve the utilization rate? > > 3) Please consider the fact that the modeset is the most common and > attractive parameter > > No name is better than the 'modeset', as other name is not easy to > remember. > > Again, this is for Linux user, thus it is not arbitrary. > Despite simple and trivial, I think about it more than one week. >
Hi Am 06.09.23 um 04:14 schrieb suijingfeng: > Hi, > > > On 2023/9/5 23:05, Thomas Zimmermann wrote: >> However, on modern Linux systems the primary display does not really >> exist. > > > No, it do exist. X server need to know which one is the primary GPU. > The '*' character at the of (4@0:0:0) PCI device is the Primary. > The '*' denote primary, see the log below. > > (II) xfree86: Adding drm device (/dev/dri/card2) > (II) xfree86: Adding drm device (/dev/dri/card0) > (II) Platform probe for > /sys/devices/pci0000:00/0000:00:1c.5/0000:003:00.0/0000:04:00.0/drm/card0 > (II) xfree86: Adding drm device (/dev/dri/card3) > (II) Platform probe for > /sys/devices/pci0000:00/0000:00:1c.6/0000:005:00.0/drm/card3 > (--) PCI: (0@0:2:0) 8086:3e91:8086:3e91 rev 0, Mem @ > 0xdb000000/167777216, 0xa0000000/536870912, I/O @ 0x0000f000/64, BIOS @ > 0x????????/131072 > (--) PCI: (1@0:0:0) 1002:6771:1043:8636 rev 0, Mem @ > 0xc0000000/2688435456, 0xdf220000/131072, I/O @ 0x0000e000/256, BIOS @ > 0x????????/131072 > (--) PCI:*(4@0:0:0) 1a03:2000:1a03:2000 rev 48, Mem @ > 0xde000000/166777216, 0xdf020000/131072, I/O @ 0x0000c000/128, BIOS @ > 0x????????/131072 > (--) PCI: (5@0:0:0) 10de:1288:174b:b324 rev 161, Mem @ > 0xdc000000/116777216, 0xd0000000/134217728, 0xd8000000/33554432, I/O @ > 0x0000b000/128, BIOS @@0x????????/524288 > > The modesetting driver of X server will create framebuffer on the > primary video adapter. > If a 2D video adapter (like the aspeed BMC) is not the primary, then it > probably will not > be used. The only chance to be able to display something is to > functional as a output slave. > But the output slave technology need the PRIME support for cross driver > buffer sharing. > > So, there do have some difference between the primary and non-primary > video adapters. Xorg is a pretty bad example, because X parses the PCI bus and then tries to match devices to /dev/dri/ files. That's also not fixable in Xorg's current code base. Please don't promote Xorg's design. It dates back to the time when Xorg did the modesetting by itself. Userspace should just open existing device files and start rendering. Maybe pick the previous settings and/or do some guess work about the arrangment of these devices. AFAIK that's what the modern compositors do. Best regards Thomas > > >> 'Primary' is the device that is available via VGA, VESA or EFI. Our >> drivers don't use these interfaces, but the native registers. As you >> said yourself, these firmware devices (VGA, VESA, EFI) are removed >> ASAP by the native drivers. >
Hi Am 06.09.23 um 04:34 schrieb suijingfeng: > > On 2023/9/5 23:05, Thomas Zimmermann wrote: >> Hi >> >> Am 05.09.23 um 15:30 schrieb suijingfeng: >>> Hi, >>> >>> >>> On 2023/9/5 18:45, Thomas Zimmermann wrote: >>>> Hi >>>> >>>> Am 04.09.23 um 21:57 schrieb Sui Jingfeng: >>>>> From: Sui Jingfeng <suijingfeng@loongson.cn> >>>>> >>>>> On a machine with multiple GPUs, a Linux user has no control over >>>>> which >>>>> one is primary at boot time. This series tries to solve above >>>>> mentioned >>>> >>>> If anything, the primary graphics adapter is the one initialized by >>>> the firmware. I think our boot-up graphics also make this assumption >>>> implicitly. >>>> >>> >>> Yes, but by the time of DRM drivers get loaded successfully,the >>> boot-up graphics already finished. >>> Firmware framebuffer device already get killed by the >>> drm_aperture_remove_conflicting_pci_framebuffers() >>> function (or its siblings). So, this series is definitely not to >>> interact with the firmware framebuffer >> >> Yes and no. The helpers you mention will attempt to remove the >> firmware framebuffer on the given PCI device. If you have multiple PCI >> devices, the other devices would not be affected. >> > Yes and no. > > > For the yes part: drm_aperture_remove_conflicting_pci_framebuffers() > only kill the conflict one. > But for a specific machine with the modern UEFI firmware, > there should be only one firmware framebuffer driver. > That shoudd be the EFIFB(UEFI GOP). I do have multiple PCI devices, > but I don't understand when and why a system will have more than one > firmware framebuffer. Maybe somewhat unrelated to the actual discussion, but it's not as simple as you assume. Many non-X86 systems use DeviceTree. On Sparc IIRC, there's the case of having multiple firmware framebuffers listed in the DT. We create an device for each and attach a DRM firmware driver; ofdrm in this case. I haven't seen this in the wild, but non-Sparc systems could also behave like that. And in addition to that, ARM-based systems often uses UEFI boot stub code that provides a simple UEFI environment to the kernel. For graphics we've had cases where we received the same firmware framebuffer from the DT and from the UEFI boot stub. We have to detect and handle such duplication in the kernel. Best regards Thomas > > Even for the machines with the legacy BIOS, the fixed VGA aperture > address range > can only be owned by one firmware driver. It is just that we need to > handle the > routing, the ->set_decode() callback of vga_client_register() is used to > do such > work. Am I correct? > >
Hi Am 06.09.23 um 05:08 schrieb suijingfeng: > Hi, > > > On 2023/9/5 23:05, Thomas Zimmermann wrote: >> However, on modern Linux systems the primary display does not really >> exist. 'Primary' is the device that is available via VGA, VESA or EFI. > > I may miss the point, what do you means by choose the word "modern"? > Are you trying to tell me that X server is too old and Wayland is the > modern display server? It comes down to that. Xorg's device handling is out of date. Fixing it would require a redesign of the whole program. A 'modern' compositor delegates device handling to the kernel. All it does is to open the device files and use the provided functionality. I've briefly mentioned this in the other email. There's more to 'modern', such as 'uses Wayland for compositing', 'Mesa for direct rendering' or 'does atomic modesetting'. But that's all unrelated here. > > >> Our drivers don't use these interfaces, but the native registers. > > > Yes and no? > > Yes for the machine with the UEFI firmware, > but I not sure if this statement is true for the machine with the legacy > firmware. What I mean is: the primary device is the one that owns the VGA/VESA/EFI I/O space. But DRM drivers don't program by VGA registers or VESA/EFI calls. They use the hardware's actual native registers in the each device's I/O space. So each device operates on it's own. They (usually) don't have to share/arbitrate access to the VGA registers. Hence the idea of a primary device does not make sense here. It's useful to pick an initial default, but further display setup should rather be left to userspace. > > As the display controller in the ASpeed BMC is VGA compatible. > Therefore, in theory, it should works with the VGA console on the machine > with another VGA compatible video card. So the ast_vga_set_decode() > function > provided in the 0007 patch probably useful on legacy firmware environment. > > To be honest, I have tested this on various machine with UEFI firmware. > But I didn't realized that I should do the testing on legacy firmware > environment > before sending this patch. It seems that the testing effort needed are > quite > exhausting, since all my machines come with the UEFI firmware. > > So is it OK to leave the legacy part to someone else who interested in it? > Probably Alex is more professional at legacy VGA routing stuff? Maybe you can describe the user's problem to us. TBH I still don't understand what you're trying to solve. If you what to set the console's initial output device, you can make a parameter in vgaarb. But I also don't really see a need for that either. Best regards Thomas > :-) > >
Hi Am 05.09.23 um 17:59 schrieb suijingfeng: [...] >> FYI: per-driver modeset parameters are deprecated and not to be used. >> Please don't promote them. > > > Well, please wait, I want to explain. > > > > drm/nouveau already promote it a little bit. > > Despite no code of conduct or specification guiding how the modules > parameters should be. > Noticed that there already have a lot of DRM drivers support the modeset > parameters, Please look at the history and discussion around this parameter. To my knowledge, 'modeset' got introduced when modesetting with still done in userspace. It was an easy way of disabling the kernel driver if the system's Xorg did no yet support kernel mode setting. Fast forward a few years and all Linux' use kernel modesetting, which make the modeset parameters obsolete. We discussed and decided to keep them in, because many articles and blog posts refer to them. We didn't want to invalidate them. BUT modeset is deprecated and not allowed in new code. If you look at existing modeset usage, you will eventually come across the comment at [1]. There's 'nomodeset', which disables all native drivers. It's useful for debugging or as a quick-fix if the graphics driver breaks. If you want to disable a specific driver, please use one of the options for blacklisting. Best regards Thomas [1] https://elixir.bootlin.com/linux/v6.5/source/include/drm/drm_module.h#L83 > for the modeset parameter, authors of various device driver try to make > the usage not > conflict with others. I believe that this is good thing for Linux users. > It is probably the responsibility of the drm core maintainers to force > various drm > drivers to reach a minimal consensus. Probably it pains to do so and > doesn't pay off. > But reach a minimal consensus do benefit to Linux users. > > >> You can use modprobe.blacklist or initcall_blacklist on the kernel >> command line. >> > There are some cases where the modprobe.blacklist doesn't works, > I have come cross several time during the past. > Because the device selected by the VGAARB is device-level thing, > it is not the driver's problem. > > Sometimes when VGAARB has a bug, it will select a wrong device as primary. > And the X server will use this wrong device as primary and completely crash > there, due to lack a driver. Take my old S3 Graphics as an example: > > $ lspci | grep VGA > > 00:06.1 VGA compatible controller: Loongson Technology LLC DC (Display > Controller) (rev 01) > 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. > [AMD/ATI] Caicos XT [Radeon HD 7470/8470 / R5 235/310 OEM] > 07:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01) > 08:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01) > > Before apply this patch: > > [ 0.361748] pci 0000:00:06.1: vgaarb: setting as boot VGA device > [ 0.361753] pci 0000:00:06.1: vgaarb: VGA device added: > decodes=io+mem,owns=io+mem,locks=none > [ 0.361765] pci 0000:03:00.0: vgaarb: VGA device added: > decodes=io+mem,owns=none,locks=none > [ 0.361773] pci 0000:07:00.0: vgaarb: VGA device added: > decodes=io+mem,owns=none,locks=none > [ 0.361779] pci 0000:08:00.0: vgaarb: VGA device added: > decodes=io+mem,owns=none,locks=none > [ 0.361781] vgaarb: loaded > [ 0.367838] pci 0000:00:06.1: Overriding boot device as 1002:6778 > [ 0.367841] pci 0000:00:06.1: Overriding boot device as 5333:9070 > [ 0.367843] pci 0000:00:06.1: Overriding boot device as 5333:9070 > > > For known reason, one of my system select the S3 Graphics as primary GPU. > But this S3 Graphics not even have a decent drm upstream driver yet. > Under such a case, I begin to believe that only the device who has a > driver deserve the primary. > > Under such a condition, I want to reboot and enter the graphic environment > with other working video cards. Either platform integrated and discrete > GPU. > This don't means I should compromise by un-mount the S3 graphics card from > the motherboard, this also don't means that I should update my BIOS > setting. > As sometimes, the BIOS is more worse. > > With this series applied, all I need to do is to reboot the computer and > pass a command line. By force override another video card (who has a > decent driver support) as primary, I'm able to do the debugging under > graphic environment. I would like to examine what's wrong with the vgaarb > on a specific platform under X server graphic environment. > > Probably try compile a driver for this card and see it works, simply reboot > without the need to change anything. It is so efficient. So this is > probably > the second usage of my patch. It hand the right of control back to the > graphic developer. > >
Hi, On 2023/9/6 14:45, Christian König wrote: > Am 05.09.23 um 15:30 schrieb suijingfeng: >> Hi, >> >> >> On 2023/9/5 18:45, Thomas Zimmermann wrote: >>> Hi >>> >>> Am 04.09.23 um 21:57 schrieb Sui Jingfeng: >>>> From: Sui Jingfeng <suijingfeng@loongson.cn> >>>> >>>> On a machine with multiple GPUs, a Linux user has no control over >>>> which >>>> one is primary at boot time. This series tries to solve above >>>> mentioned >>> >>> If anything, the primary graphics adapter is the one initialized by >>> the firmware. I think our boot-up graphics also make this assumption >>> implicitly. >>> >> >> Yes, but by the time of DRM drivers get loaded successfully,the >> boot-up graphics already finished. > > This is an incorrect assumption. > > drm_aperture_remove_conflicting_pci_framebuffers() and co don't kill > the framebuffer, Well, my original description to this technique point is that 1) "Firmware framebuffer device already get killed by the drm_aperture_remove_conflicting_pci_framebuffers() function (or its siblings)" 2) "By the time of DRM drivers get loaded successfully, the boot-up graphics already finished." The word "killed" here is rough and coarse description about how does the drm device driver take over the firmware framebuffer. Since there seems have something obscure our communication, lets make the things clear. See below for more elaborate description. > they just remove the current framebuffer driver to avoid further updates. > This statement doesn't sound right, for UEFI environment, a correct description is that they remove the platform device, not the framebuffer driver. For the machines with the UEFI firmware, framebuffer driver here definitely refer to the efifb. The efifb still reside in the system(linux kernel). Please see the aperture_detach_platform_device() function in video/aperture.c > So what happens (at least for amdgpu) is that we take over the > framebuffer, This statement here is also not an accurate description. Strictly speaking, drm/amdgpu takes over the device (the VRAM hardware), not the framebuffer. The word "take over" here is also dubious, because drm/amdgpu takes over nothing. From the perspective of device-driver model, the GPU hardware *belongs* to the amdgpu drivers. Why you need to take over a thing originally and belong to you? If you could build the drm/amdgpu into the kernel and make it get loaded before the efifb. Then, there no need to use the firmware framebuffer ( the talking is limited to the display boot graphics purpose here). On such a case, the so-called "take over" will not happen. The truth is that the efifb create a platform device, which *occupy* part of the VRAM hardware resource. Thus, the efifb and the drm/amdgpu form the conflict. There are conflict because they share the same hardware resource. It is the hardware resources(address ranges) used by two different driver are conflict. Not the efifb driver itself conflict with drm/amdgpu driver. Thus, drm_aperture_remove_conflicting_xxxxxx() function have to kill one of the device are conflicting. Not to kill the driver. Therefore, the correct word would be the "reclaim". drm/amdgpu *reclaim* the hardware resource (vram address range) originally belong to you. The modeset state (including the framebuffer content) still reside in the amdgpu device. You just get the dirty framebuffer image in the framebuffer object. But the framebuffer object already dirty since it in the UEFI firmware stage. In conclusion, *reclaim* is more accurate than the "take over". And as far as I'm understanding, the drm/amdgpu take over nothing, no gains. Well, welcome to correct me if I'm wrong.
Am 06.09.23 um 11:08 schrieb suijingfeng:
> Well, welcome to correct me if I'm wrong.
You seem to have some very basic misunderstandings here.
The term framebuffer describes some VRAM memory used for scanout.
This framebuffer is exposed to userspace through some framebuffer
driver, on UEFI platforms that is usually efifb but can be quite a bunch
of different drivers.
When the DRM drivers load they remove the previous drivers using
drm_aperture_remove_conflicting_pci_framebuffers() (or similar
function), but this does not mean that the framebuffer or scanout
parameters are modified in any way. It just means that the framebuffer
is just no longer exposed through this driver.
Take over is the perfectly right description here because that's exactly
what's happening. The framebuffer configuration including the VRAM
memory as well as the parameters for scanout are exposed by the newly
loaded DRM driver.
In other words userspace can query through the DRM interfaces which
monitors already driven by the hardware and so in your terminology
figure out which is the primary one.
It's just that as Thomas explained as well that this completely
irrelevant to any modern desktop. Both X and Wayland both iterate the
available devices and start rendering to them which one was used during
boot doesn't really matter to them.
Apart from that ranting like this and trying to explain stuff to people
who obviously have much better background in the topic is not going to
help your patches getting upstream.
Regards,
Christian.
Hi, On 2023/9/6 16:05, Thomas Zimmermann wrote: > Hi > > Am 05.09.23 um 17:59 schrieb suijingfeng: > [...] >>> FYI: per-driver modeset parameters are deprecated and not to be >>> used. Please don't promote them. >> >> >> Well, please wait, I want to explain. >> >> >> >> drm/nouveau already promote it a little bit. >> >> Despite no code of conduct or specification guiding how the modules >> parameters should be. >> Noticed that there already have a lot of DRM drivers support the >> modeset parameters, > > Please look at the history and discussion around this parameter. To my > knowledge, 'modeset' got introduced when modesetting with still done > in userspace. It was an easy way of disabling the kernel driver if the > system's Xorg did no yet support kernel mode setting. > > Fast forward a few years and all Linux' use kernel modesetting, which > make the modeset parameters obsolete. We discussed and decided to keep > them in, because many articles and blog posts refer to them. We didn't > want to invalidate them. BUT modeset is deprecated and not allowed in > new code. If you look at existing modeset usage, you will eventually > come across the comment at [1]. > OK, no problem. I agree what you said. > There's 'nomodeset', which disables all native drivers. It's useful > for debugging or as a quick-fix if the graphics driver breaks. If you > want to disable a specific driver, please use one of the options for > blacklisting. > Yeah, the 'nomodeset' disables all native drivers, this is a good point of it, but this is also the weak point of it. Sometimes, when you are developing a drm driver for a new device. You will see the pain. Its too often a programmer's modification make the entire Linux kernel hang there. The problematic drm driver kernel module already in the initrd. Then, the real need to disable the ill-functional drm driver kernel module only. While what you recommend to disable them all. There are subtle difference. Another limitation of the 'nomodeset' parameter is that it is only available on recent upstream kernel. Low version downstream kernel don't has this parameter supported yet. So this create inconstant developing experience. I believe that there always some people need do back-port and upstream work for various reasons. While (kindly, no offensive) debating, since we have the modprobe.blacklist why we still need the 'nomodeset' parameter ? why not try modprobe.blacklist="amdgpu,radeon,i915,ast,nouveau,gma500_gfx, ..." :-/ But OK in overall, I will listen to your advice. > Best regards > Thomas > > [1] > https://elixir.bootlin.com/linux/v6.5/source/include/drm/drm_module.h#L83 > > >> for the modeset parameter, authors of various device driver try to >> make the usage not >> conflict with others. I believe that this is good thing for Linux users. >> It is probably the responsibility of the drm core maintainers to >> force various drm >> drivers to reach a minimal consensus. Probably it pains to do so and >> doesn't pay off. >> But reach a minimal consensus do benefit to Linux users. >> >> >>> You can use modprobe.blacklist or initcall_blacklist on the kernel >>> command line. >>> >> There are some cases where the modprobe.blacklist doesn't works, >> I have come cross several time during the past. >> Because the device selected by the VGAARB is device-level thing, >> it is not the driver's problem. >> >> Sometimes when VGAARB has a bug, it will select a wrong device as >> primary. >> And the X server will use this wrong device as primary and completely >> crash >> there, due to lack a driver. Take my old S3 Graphics as an example: >> >> $ lspci | grep VGA >> >> 00:06.1 VGA compatible controller: Loongson Technology LLC DC >> (Display Controller) (rev 01) >> 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. >> [AMD/ATI] Caicos XT [Radeon HD 7470/8470 / R5 235/310 OEM] >> 07:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 >> (rev 01) >> 08:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 >> (rev 01) >> >> Before apply this patch: >> >> [ 0.361748] pci 0000:00:06.1: vgaarb: setting as boot VGA device >> [ 0.361753] pci 0000:00:06.1: vgaarb: VGA device added: >> decodes=io+mem,owns=io+mem,locks=none >> [ 0.361765] pci 0000:03:00.0: vgaarb: VGA device added: >> decodes=io+mem,owns=none,locks=none >> [ 0.361773] pci 0000:07:00.0: vgaarb: VGA device added: >> decodes=io+mem,owns=none,locks=none >> [ 0.361779] pci 0000:08:00.0: vgaarb: VGA device added: >> decodes=io+mem,owns=none,locks=none >> [ 0.361781] vgaarb: loaded >> [ 0.367838] pci 0000:00:06.1: Overriding boot device as 1002:6778 >> [ 0.367841] pci 0000:00:06.1: Overriding boot device as 5333:9070 >> [ 0.367843] pci 0000:00:06.1: Overriding boot device as 5333:9070 >> >> >> For known reason, one of my system select the S3 Graphics as primary >> GPU. >> But this S3 Graphics not even have a decent drm upstream driver yet. >> Under such a case, I begin to believe that only the device who has a >> driver deserve the primary. >> >> Under such a condition, I want to reboot and enter the graphic >> environment >> with other working video cards. Either platform integrated and >> discrete GPU. >> This don't means I should compromise by un-mount the S3 graphics card >> from >> the motherboard, this also don't means that I should update my BIOS >> setting. >> As sometimes, the BIOS is more worse. >> >> With this series applied, all I need to do is to reboot the computer and >> pass a command line. By force override another video card (who has a >> decent driver support) as primary, I'm able to do the debugging under >> graphic environment. I would like to examine what's wrong with the >> vgaarb >> on a specific platform under X server graphic environment. >> >> Probably try compile a driver for this card and see it works, simply >> reboot >> without the need to change anything. It is so efficient. So this is >> probably >> the second usage of my patch. It hand the right of control back to the >> graphic developer. >> >> >
Hi, On 2023/9/6 14:45, Christian König wrote: >> Firmware framebuffer device already get killed by the >> drm_aperture_remove_conflicting_pci_framebuffers() >> function (or its siblings). So, this series is definitely not to >> interact with the firmware framebuffer >> (or more intelligent framebuffer drivers). It is for user space >> program, such as X server and Wayland >> compositor. Its for Linux user or drm drivers testers, which allow >> them to direct graphic display server >> using right hardware of interested as primary video card. >> >> Also, I believe that X server and Wayland compositor are the best >> test examples. >> If a specific DRM driver can't work with X server as a primary, >> then there probably have something wrong. >> >> >>> But what's the use case for overriding this setting? >>> >> >> On a specific machine with multiple GPUs mounted, >> only the primary graphics get POST-ed (initialized) by the firmware. >> Therefore, the DRM drivers for the rest video cards, have to choose to >> work without the prerequisite setups done by firmware, This is called >> as POST. > > Well, you don't seem to understand the background here. This is > perfectly normal behavior. > > Secondary cards are posted after loading the appropriate DRM driver. > At least for amdgpu this is done by calling the appropriate functions > in the BIOS. Well, thanks for you tell me this. You know more than me and definitely have a better understanding. Are you telling me that the POST function for AMDGPU reside in the BIOS? The kernel call into the BIOS? Does the BIOS here refer to the UEFI runtime or ATOM BIOS or something else? But the POST function for the drm ast, reside in the kernel space (in other word, in ast.ko). Is this statement correct? I means that for ASpeed BMC chip, if the firmware not POST the display controller. Then we have to POST it at the kernel space before doing various modeset option. We can only POST this chip by directly operate the various registers. Am I correct for the judgement about ast drm driver? Thanks for your reviews.
Am 06.09.23 um 12:31 schrieb Sui Jingfeng: > Hi, > > On 2023/9/6 14:45, Christian König wrote: >>> Firmware framebuffer device already get killed by the >>> drm_aperture_remove_conflicting_pci_framebuffers() >>> function (or its siblings). So, this series is definitely not to >>> interact with the firmware framebuffer >>> (or more intelligent framebuffer drivers). It is for user space >>> program, such as X server and Wayland >>> compositor. Its for Linux user or drm drivers testers, which allow >>> them to direct graphic display server >>> using right hardware of interested as primary video card. >>> >>> Also, I believe that X server and Wayland compositor are the best >>> test examples. >>> If a specific DRM driver can't work with X server as a primary, >>> then there probably have something wrong. >>> >>> >>>> But what's the use case for overriding this setting? >>>> >>> >>> On a specific machine with multiple GPUs mounted, >>> only the primary graphics get POST-ed (initialized) by the firmware. >>> Therefore, the DRM drivers for the rest video cards, have to choose to >>> work without the prerequisite setups done by firmware, This is >>> called as POST. >> >> Well, you don't seem to understand the background here. This is >> perfectly normal behavior. >> >> Secondary cards are posted after loading the appropriate DRM driver. >> At least for amdgpu this is done by calling the appropriate functions >> in the BIOS. > > > Well, thanks for you tell me this. You know more than me and > definitely have a better understanding. > > Are you telling me that the POST function for AMDGPU reside in the BIOS? > The kernel call into the BIOS? Yes, exactly that. > Does the BIOS here refer to the UEFI runtime or ATOM BIOS or something > else? On dGPUs it's the VBIOS on a flashrom on the board, for iGPUs (APUs as AMD calls them) it's part of the system BIOS. UEFI is actually just a small subsystem in the system BIOS which replaced the old interface used between system BIOS, video BIOS and operating system. > > But the POST function for the drm ast, reside in the kernel space (in > other word, in ast.ko). > Is this statement correct? I don't know the ast driver well enough to answer that, but I assume they just read the BIOS and execute the appropriate functions. > > I means that for ASpeed BMC chip, if the firmware not POST the display > controller. > Then we have to POST it at the kernel space before doing various > modeset option. > We can only POST this chip by directly operate the various registers. > Am I correct for the judgement about ast drm driver? Well POST just means Power On Self Test, but what you mean is initializing the hardware. Some drivers can of course initialize the hardware without the help of the BIOS, but I don't think AST can do that. As far as I know it's a relatively simple driver. BTW firmware is not the same as the BIOS (which runs the POST), firmware usually refers to something run on microcontrollers inside the ASIC while the (system or video) BIOS runs on the host CPU. Regards, Christian. > > Thanks for your reviews. >
Hi Am 06.09.23 um 11:48 schrieb suijingfeng: [...] > >> There's 'nomodeset', which disables all native drivers. It's useful >> for debugging or as a quick-fix if the graphics driver breaks. If you >> want to disable a specific driver, please use one of the options for >> blacklisting. >> > Yeah, the 'nomodeset' disables all native drivers, > this is a good point of it, but this is also the weak point of it. Well, that's by design. Graphics is at the core of the user experience. We often cannot _not_ provide it. And if it's broken, there needs to be a reliable fallback. There needs to be at least enough graphics support to run a terminal and repair the system. And it also needs to be simple enough for the average user. Falling back to serial terminals if often not an option. At least here at SUSE, when users or customers report a broken graphics driver, we can tell them to start with 'nomodeset' and get at least the basic graphics. That's good enough for most productivity/office software. In the meantime, we investigate the problem. There were concerns about the need of nomodeset, but I think it has proven to be useful in practice. > Sometimes, when you are developing a drm driver for a new device. > You will see the pain. Its too often a programmer's modification > make the entire Linux kernel hang there. The problematic drm > driver kernel module already in the initrd. Then, the real > need to disable the ill-functional drm driver kernel module > only. While what you recommend to disable them all. There > are subtle difference. I found that initcall_blacklist=<func name> works reliable for me. > > Another limitation of the 'nomodeset' parameter is that > it is only available on recent upstream kernel. Low version > downstream kernel don't has this parameter supported yet. > So this create inconstant developing experience. I believe that > there always some people need do back-port and upstream work > for various reasons. Nomodeset used to be there, but in a different form. It forced VGA text mode IIRC. 'git grep' for vga_text_force() in an old kernel. We adopted the parameter for all of graphics, because it already did what we needed. Best regards Thomas > > While (kindly, no offensive) debating, since we have the modprobe.blacklist > why we still need the 'nomodeset' parameter ? > why not try > modprobe.blacklist="amdgpu,radeon,i915,ast,nouveau,gma500_gfx, ..." > > :-/ > > > But OK in overall, I will listen to your advice. > > >> Best regards >> Thomas >> >> [1] >> https://elixir.bootlin.com/linux/v6.5/source/include/drm/drm_module.h#L83 >> >> >>> for the modeset parameter, authors of various device driver try to >>> make the usage not >>> conflict with others. I believe that this is good thing for Linux users. >>> It is probably the responsibility of the drm core maintainers to >>> force various drm >>> drivers to reach a minimal consensus. Probably it pains to do so and >>> doesn't pay off. >>> But reach a minimal consensus do benefit to Linux users. >>> >>> >>>> You can use modprobe.blacklist or initcall_blacklist on the kernel >>>> command line. >>>> >>> There are some cases where the modprobe.blacklist doesn't works, >>> I have come cross several time during the past. >>> Because the device selected by the VGAARB is device-level thing, >>> it is not the driver's problem. >>> >>> Sometimes when VGAARB has a bug, it will select a wrong device as >>> primary. >>> And the X server will use this wrong device as primary and completely >>> crash >>> there, due to lack a driver. Take my old S3 Graphics as an example: >>> >>> $ lspci | grep VGA >>> >>> 00:06.1 VGA compatible controller: Loongson Technology LLC DC >>> (Display Controller) (rev 01) >>> 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. >>> [AMD/ATI] Caicos XT [Radeon HD 7470/8470 / R5 235/310 OEM] >>> 07:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 >>> (rev 01) >>> 08:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 >>> (rev 01) >>> >>> Before apply this patch: >>> >>> [ 0.361748] pci 0000:00:06.1: vgaarb: setting as boot VGA device >>> [ 0.361753] pci 0000:00:06.1: vgaarb: VGA device added: >>> decodes=io+mem,owns=io+mem,locks=none >>> [ 0.361765] pci 0000:03:00.0: vgaarb: VGA device added: >>> decodes=io+mem,owns=none,locks=none >>> [ 0.361773] pci 0000:07:00.0: vgaarb: VGA device added: >>> decodes=io+mem,owns=none,locks=none >>> [ 0.361779] pci 0000:08:00.0: vgaarb: VGA device added: >>> decodes=io+mem,owns=none,locks=none >>> [ 0.361781] vgaarb: loaded >>> [ 0.367838] pci 0000:00:06.1: Overriding boot device as 1002:6778 >>> [ 0.367841] pci 0000:00:06.1: Overriding boot device as 5333:9070 >>> [ 0.367843] pci 0000:00:06.1: Overriding boot device as 5333:9070 >>> >>> >>> For known reason, one of my system select the S3 Graphics as primary >>> GPU. >>> But this S3 Graphics not even have a decent drm upstream driver yet. >>> Under such a case, I begin to believe that only the device who has a >>> driver deserve the primary. >>> >>> Under such a condition, I want to reboot and enter the graphic >>> environment >>> with other working video cards. Either platform integrated and >>> discrete GPU. >>> This don't means I should compromise by un-mount the S3 graphics card >>> from >>> the motherboard, this also don't means that I should update my BIOS >>> setting. >>> As sometimes, the BIOS is more worse. >>> >>> With this series applied, all I need to do is to reboot the computer and >>> pass a command line. By force override another video card (who has a >>> decent driver support) as primary, I'm able to do the debugging under >>> graphic environment. I would like to examine what's wrong with the >>> vgaarb >>> on a specific platform under X server graphic environment. >>> >>> Probably try compile a driver for this card and see it works, simply >>> reboot >>> without the need to change anything. It is so efficient. So this is >>> probably >>> the second usage of my patch. It hand the right of control back to the >>> graphic developer. >>> >>> >> >
On Wed, 6 Sep 2023 11:51:59 +0800 Sui Jingfeng <sui.jingfeng@linux.dev> wrote: > Hi, > > > On 2023/9/5 22:52, Alex Williamson wrote: > > On Tue, 5 Sep 2023 03:57:15 +0800 > > Sui Jingfeng <sui.jingfeng@linux.dev> wrote: > > > >> From: Sui Jingfeng <suijingfeng@loongson.cn> > >> > >> On a machine with multiple GPUs, a Linux user has no control over which > >> one is primary at boot time. This series tries to solve above mentioned > >> problem by introduced the ->be_primary() function stub. The specific > >> device drivers can provide an implementation to hook up with this stub by > >> calling the vga_client_register() function. > >> > >> Once the driver bound the device successfully, VGAARB will call back to > >> the device driver. To query if the device drivers want to be primary or > >> not. Device drivers can just pass NULL if have no such needs. > >> > >> Please note that: > >> > >> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would > >> like to mount at least three video cards. > >> > >> 2) Typically, those non-86 machines don't have a good UEFI firmware > >> support, which doesn't support select primary GPU as firmware stage. > >> Even on x86, there are old UEFI firmwares which already made undesired > >> decision for you. > >> > >> 3) This series is attempt to solve the remain problems at the driver level, > >> while another series[1] of me is target to solve the majority of the > >> problems at device level. > >> > >> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics > >> 630 is the default boot VGA, successfully override by ast2400 with > >> ast.modeset=10 append at the kernel cmd line. > >> > >> $ lspci | grep VGA > >> > >> 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] > > In all my previous experiments with VGA routing and IGD I found that > > IGD can't actually release VGA routing and Intel confirmed the hardware > > doesn't have the ability to do so. > > Which model of the IGD you are using? even for the IGD in Atom D2550, > the legacy 128KB VGA memory range can be tuned to be mapped to IGD > or to the DMI Interface. See the 1.7.3.2 section of the N2000 datasheet[1]. I believe it's the VGA I/O that can't be disabled, there's no means to do so other than the I/O enable bit in the command register and iirc the driver depends on this for other features. The history of this is pretty old, but here are some links: https://lore.kernel.org/all/1376486637.31494.19.camel@ul30vt.home/ https://bbs.archlinux.org/viewtopic.php?pid=1400212#p1400212 https://lore.kernel.org/all/20130815223917.27890.28003.stgit@bling.home/ https://lore.kernel.org/all/20130824144701.23370.42110.stgit@bling.home/ https://lore.kernel.org/all/20140509201655.2849.97478.stgit@bling.home/ I think the issue was that i915 doesn't claim to the VGA arbiter to be controlling legacy VGA ranges, but in fact the hardware does claim those ranges. We can "fix" i915 to report that VGA MMIO space is owned and can be controlled, but then Xorg likely sees multiple VGA arbiter clients and disables DRI because it wants to mmap VGA MMIO space. Therefore unless something has changed in the past 10yrs, i915 owns but does not advertise ownership of the VGA address spaces and therefore the arbiter can't and doesn't know to change VGA routing to enable a "be_primary" path to another device. > If a specific model of Intel has a bug in the VGA routing hardware logic unit, > I would like to ignore it. Or switch to the UEFI firmware on such hardware. That's a convenient and impractical approach. I expect all Intel HD graphics has this issue. Unknown for Xe. > It is the hardware engineer's responsibility, I will not worry about it. We often need to deal with broken hardware in the kernel. > Thanks for you tell this. > > [1] https://www.intel.com/content/dam/doc/datasheet/atom-d2000-n2000-vol-2-datasheet.pdf > > > > It will always be primary from a > > VGA routing perspective. Was this actually tested with non-UEFI? > > > As you already said, the generous Intel already have confirmed that the hardware defect. > So probably this is a good chance to switch to UEFI to solve the problem. Then, no > testing for legacy is needed. Then why are we hacking on VGA arbitration in this series at all? > > I suspect it might only work in UEFI mode where we probably don't > > actually have a dependency on VGA routing. This is essentially why > > vfio requires UEFI ROMs when assigning GPUs to VMs, VGA routing is too > > broken to use on Intel systems with IGD. Thanks, > > Thanks for you tell me this. > > To be honest, I have only tested my patch on machines with UEFI firmware. > Since UEFI because the main stream, but if this patch is really useful for > majority machine, I'm satisfied. The results is not too bad. This looks like a pretty significant scoping issue if you're proposing changes to the VGA arbiter which specifically handles the routing of legacy VGA address spaces but are not willing to commit to testing legacy configurations. Thanks, Alex
Hi, On 2023/9/6 17:40, Christian König wrote: > Am 06.09.23 um 11:08 schrieb suijingfeng: >> Well, welcome to correct me if I'm wrong. > > You seem to have some very basic misunderstandings here. > > The term framebuffer describes some VRAM memory used for scanout. > > This framebuffer is exposed to userspace through some framebuffer > driver, on UEFI platforms that is usually efifb but can be quite a > bunch of different drivers. > > When the DRM drivers load they remove the previous drivers using > drm_aperture_remove_conflicting_pci_framebuffers() (or similar > function), but this does not mean that the framebuffer or scanout > parameters are modified in any way. It just means that the framebuffer > is just no longer exposed through this driver. > > Take over is the perfectly right description here because that's > exactly what's happening. The framebuffer configuration including the > VRAM memory as well as the parameters for scanout are exposed by the > newly loaded DRM driver. > > In other words userspace can query through the DRM interfaces which > monitors already driven by the hardware and so in your terminology > figure out which is the primary one. > I'm a little bit of not convinced about this idea, you might be correct. But there cases where three are multiple monitors and each video card connect one. It also quite common that no monitors is connected, let the machine boot first, then find a monitors to connect to a random display output. See which will display. I don't expect the primary shake with. The primary one have to be determined as early as possible, because of the VGA console and the framebuffer console may directly output the primary. Get the DDC and/or HPD involved may necessary complicated the problem. There are ASpeed BMC who add a virtual connector in order to able display remotely. There are also have commands to force a connector to be connected status. > It's just that as Thomas explained as well that this completely > irrelevant to any modern desktop. Both X and Wayland both iterate the > available devices and start rendering to them which one was used > during boot doesn't really matter to them. > You may be correct, but I'm still not sure. I probably need more times to investigate. Me and my colleagues are mainly using X server, the version varies from 1.20.4 and 1.21.1.4. Even this is true, the problems still exist for non-modern desktops. > Apart from that ranting like this and trying to explain stuff to > people who obviously have much better background in the topic is not > going to help your patches getting upstream. > Thanks for you tell me so much knowledge, I'm realized where are the problems now. I will try to resolve the concerns at the next version. > Regards, > Christian. >
Am 07.09.23 um 04:30 schrieb Sui Jingfeng: > Hi, > > > On 2023/9/6 17:40, Christian König wrote: >> Am 06.09.23 um 11:08 schrieb suijingfeng: >>> Well, welcome to correct me if I'm wrong. >> >> You seem to have some very basic misunderstandings here. >> >> The term framebuffer describes some VRAM memory used for scanout. >> >> This framebuffer is exposed to userspace through some framebuffer >> driver, on UEFI platforms that is usually efifb but can be quite a >> bunch of different drivers. >> >> When the DRM drivers load they remove the previous drivers using >> drm_aperture_remove_conflicting_pci_framebuffers() (or similar >> function), but this does not mean that the framebuffer or scanout >> parameters are modified in any way. It just means that the >> framebuffer is just no longer exposed through this driver. >> >> Take over is the perfectly right description here because that's >> exactly what's happening. The framebuffer configuration including the >> VRAM memory as well as the parameters for scanout are exposed by the >> newly loaded DRM driver. >> >> In other words userspace can query through the DRM interfaces which >> monitors already driven by the hardware and so in your terminology >> figure out which is the primary one. >> > I'm a little bit of not convinced about this idea, you might be correct. Well I can point you to the code if you don't believe me. > But there cases where three are multiple monitors and each video card > connect one. Yeah, but this is irrelevant. The key point is the configuration is taken over when the driver loads. So whatever is there before as setup (one monitor showing console, three monitors mirrored, whatever) should be there after loading the driver as well. This configuration is just immediately overwritten because nobody cares about it. > > It also quite common that no monitors is connected, let the machine boot > first, then find a monitors to connect to a random display output. See > which will display. I don't expect the primary shake with. > The primary one have to be determined as early as possible, because of > the VGA console and the framebuffer console may directly output the > primary. Well that is simply not correct. There is not concept of "primary" display, it can just be that a monitor was brought up by the BIOS or bootloader and we take over this configuration. > Get the DDC and/or HPD involved may necessary complicated the problem. > > There are ASpeed BMC who add a virtual connector in order to able > display remotely. > There are also have commands to force a connector to be connected status. > > >> It's just that as Thomas explained as well that this completely >> irrelevant to any modern desktop. Both X and Wayland both iterate the >> available devices and start rendering to them which one was used >> during boot doesn't really matter to them. >> > You may be correct, but I'm still not sure. > I probably need more times to investigate. > Me and my colleagues are mainly using X server, > the version varies from 1.20.4 and 1.21.1.4. > Even this is true, the problems still exist for non-modern desktops. Well, I have over 25 years of experience with display hardware and what you describe here was never an issue. What you have is simply a broken display driver which for some reason can't handle your use case. I strongly suggest that you just completely drop this here and go into the AST driver and try to fix it. Regards, Christian. > >> Apart from that ranting like this and trying to explain stuff to >> people who obviously have much better background in the topic is not >> going to help your patches getting upstream. >> > > Thanks for you tell me so much knowledge, > I'm realized where are the problems now. > I will try to resolve the concerns at the next version. > > >> Regards, >> Christian. >>
On Wed, 06 Sep 2023, suijingfeng <suijingfeng@loongson.cn> wrote: > Another limitation of the 'nomodeset' parameter is that > it is only available on recent upstream kernel. Low version > downstream kernel don't has this parameter supported yet. > So this create inconstant developing experience. I believe that > there always some people need do back-port and upstream work > for various reasons. While that may be true, it's not an argument in favour of adding new module parameters or special values to existing module parameters. They would have to be backported just as well. BR, Jani.
Hi, On 2023/9/7 17:08, Christian König wrote: > Well, I have over 25 years of experience with display hardware and > what you describe here was never an issue. I want to give you an example to let you know more. I have a ASRock AD2550B-ITX board[1], When another discrete video card is mounted into it mini PCIe slot or PCI slot, The IGD cannot be the primary display adapter anymore. The display is totally black. I have try to draft a few trivial patch to help fix this[2]. And I want to use the IGD as primary, does this count as an issue? [1] https://www.asrock.com/mb/Intel/AD2550-ITX/ [2] https://patchwork.freedesktop.org/series/123073/
Am 07.09.23 um 14:32 schrieb suijingfeng: > Hi, > > > On 2023/9/7 17:08, Christian König wrote: >> Well, I have over 25 years of experience with display hardware and >> what you describe here was never an issue. > > I want to give you an example to let you know more. > > I have a ASRock AD2550B-ITX board[1], > When another discrete video card is mounted into it mini PCIe slot or > PCI slot, > The IGD cannot be the primary display adapter anymore. The display is > totally black. > I have try to draft a few trivial patch to help fix this[2]. > > And I want to use the IGD as primary, does this count as an issue? No, this is completely expected behavior and a limitation of the hardware design. As far as I know both AMD and Intel GPUs work the same here. Regards, Christian. > > [1] https://www.asrock.com/mb/Intel/AD2550-ITX/ > [2] https://patchwork.freedesktop.org/series/123073/ >
Hi, On 2023/9/7 20:43, Christian König wrote: > Am 07.09.23 um 14:32 schrieb suijingfeng: >> Hi, >> >> >> On 2023/9/7 17:08, Christian König wrote: >>> Well, I have over 25 years of experience with display hardware and >>> what you describe here was never an issue. >> >> I want to give you an example to let you know more. >> >> I have a ASRock AD2550B-ITX board[1], >> When another discrete video card is mounted into it mini PCIe slot or >> PCI slot, >> The IGD cannot be the primary display adapter anymore. The display is >> totally black. >> I have try to draft a few trivial patch to help fix this[2]. >> >> And I want to use the IGD as primary, does this count as an issue? > > No, this is completely expected behavior and a limitation of the > hardware design. > > As far as I know both AMD and Intel GPUs work the same here. > > Regards, > Christian. > >> >> [1] https://www.asrock.com/mb/Intel/AD2550-ITX/ >> [2] https://patchwork.freedesktop.org/series/123073/ >> Then, I'll give you another example, see below for elaborate description. I have one AMD BC160 GPU, see[1] to get what it looks like. The GPU don't has a display connector interface exported. It actually can be seen as a render-only GPU or compute class GPU for bitcoin. But the firmware of it still acclaim this GPU as VGA compatible. When mount this GPU onto motherboard, the system always select this GPU as primary. But this GPU can't be able to connect with a monitor. Under such a situation, modprobe.blacklist=amdgpu don't works either, because vgaarb always select this GPU as primary, this is a device-level decision. $ dmesg | grep vgaarb: [ 3.541405] pci 0000:0c:00.0: vgaarb: BAR 0: [mem 0xa0000000-0xafffffff 64bit pref] contains firmware FB [0xa0000000-0xa02fffff] [ 3.901448] pci 0000:05:00.0: vgaarb: setting as boot VGA device [ 3.905375] pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none [ 3.905382] pci 0000:0c:00.0: vgaarb: setting as boot VGA device (overriding previous) [ 3.909375] pci 0000:0c:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none [ 3.913375] pci 0000:0d:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none [ 3.913377] vgaarb: loaded [ 13.513760] amdgpu 0000:0c:00.0: vgaarb: deactivate vga console [ 19.020992] amdgpu 0000:0c:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem I'm using ubuntu 22.04 system, with ast.modeset=10 passed on the cmd line, I still be able to enter the graphics system. And views this GPU as a render-only GPU. Probably continue to examine what's wrong, except this, drm/amdgpu report " *ERROR* IB test failed on sdma0 (-110)" to me. Does this count as problem? Before I could find solution, I have keep this de-fact render only GPU mounted. Because I need recompile kennel module, install the kernel module and testing. All I need is a 2D video card to display something, ast drm is OK, despite simple. It suit the need for my daily usage with VIM, that's enough for me. Now, the real questions that I want ask is: 1) Does the fact that when the kernel driver module got blocked (by modprobe.blacklist=amdgpu), while the vgaarb still select it as primary which leave the X server crash there (because no kennel space driver loaded) count as a problem? 2) Does my approach that mounting another GPU as the primary display adapter, while its real purpose is to solving bugs and development for another GPU, count as a use case? $ cat demsg.txt | grep drm [ 10.099888] ACPI: bus type drm_connector registered [ 11.083920] etnaviv 0000:0d:00.0: [drm] bind etnaviv-display, master name: 0000:0d:00.0 [ 11.084106] [drm] Initialized etnaviv 1.3.0 20151214 for 0000:0d:00.0 on minor 0 [ 13.301702] [drm] amdgpu kernel modesetting enabled. [ 13.359820] [drm] initializing kernel modesetting (NAVI12 0x1002:0x7360 0x1002:0x0A34 0xC7). [ 13.368246] [drm] register mmio base: 0xEB100000 [ 13.372861] [drm] register mmio size: 524288 [ 13.380788] [drm] add ip block number 0 <nv_common> [ 13.385661] [drm] add ip block number 1 <gmc_v10_0> [ 13.390531] [drm] add ip block number 2 <navi10_ih> [ 13.395405] [drm] add ip block number 3 <psp> [ 13.399760] [drm] add ip block number 4 <smu> [ 13.404111] [drm] add ip block number 5 <dm> [ 13.408378] [drm] add ip block number 6 <gfx_v10_0> [ 13.413249] [drm] add ip block number 7 <sdma_v5_0> [ 13.433546] [drm] add ip block number 8 <vcn_v2_0> [ 13.433547] [drm] add ip block number 9 <jpeg_v2_0> [ 13.497757] [drm] VCN decode is enabled in VM mode [ 13.502540] [drm] VCN encode is enabled in VM mode [ 13.508785] [drm] JPEG decode is enabled in VM mode [ 13.529596] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit [ 13.564762] [drm] Detected VRAM RAM=8176M, BAR=256M [ 13.569628] [drm] RAM width 2048bits HBM [ 13.574167] [drm] amdgpu: 8176M of VRAM memory ready [ 13.579125] [drm] amdgpu: 15998M of GTT memory ready. [ 13.584184] [drm] GART: num cpu pages 131072, num gpu pages 131072 [ 13.590505] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000). [ 13.598749] [drm] Found VCN firmware Version ENC: 1.16 DEC: 5 VEP: 0 Revision: 4 [ 13.671786] [drm] reserve 0xe00000 from 0x81fd000000 for PSP TMR [ 13.801235] [drm] Display Core v3.2.247 initialized on DCN 2.0 [ 13.807061] [drm] DP-HDMI FRL PCON supported [ 13.832382] [drm] kiq ring mec 2 pipe 1 q 0 [ 13.838131] [drm] VCN decode and encode initialized successfully(under DPG Mode). [ 13.845877] [drm] JPEG decode initialized successfully. [ 14.072508] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:0c:00.0 on minor 1 [ 14.080976] amdgpu 0000:0c:00.0: [drm] Cannot find any crtc or sizes [ 14.087341] [drm] DSC precompute is not needed. [ 16.487330] systemd[1]: Starting Load Kernel Module drm... [ 619.901873] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000). [ 619.901898] [drm] PSP is resuming... [ 619.925307] [drm] reserve 0xe00000 from 0x81fd000000 for PSP TMR [ 619.991034] [drm] psp gfx command AUTOLOAD_RLC(0x21) failed and response status is (0xFFFF000D) [ 620.294366] [drm] kiq ring mec 2 pipe 1 q 0 [ 620.298953] [drm] VCN decode and encode initialized successfully(under DPG Mode). [ 620.299103] [drm] JPEG decode initialized successfully. [ 621.309543] [drm:sdma_v5_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out [ 621.317577] amdgpu 0000:0c:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on sdma0 (-110). [ 622.333548] [drm:sdma_v5_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out [ 622.341587] amdgpu 0000:0c:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on sdma1 (-110). [ 622.354071] [drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110). [ 622.363721] amdgpu 0000:0c:00.0: [drm] Cannot find any crtc or sizes [1] https://www.techpowerup.com/gpu-specs/xfx-bc-160.b9346
Am 07.09.23 um 17:26 schrieb suijingfeng: > [SNIP] > Then, I'll give you another example, see below for elaborate description. > I have one AMD BC160 GPU, see[1] to get what it looks like. > > The GPU don't has a display connector interface exported. > It actually can be seen as a render-only GPU or compute class GPU for > bitcoin. > But the firmware of it still acclaim this GPU as VGA compatible. > When mount this GPU onto motherboard, the system always select this > GPU as primary. > But this GPU can't be able to connect with a monitor. > > Under such a situation, modprobe.blacklist=amdgpu don't works either, > because vgaarb always select this GPU as primary, this is a > device-level decision. It's not VGAARB which makes this selection, it's the BIOS. VGAARB just detects what the BIOS has decided. > > $ dmesg | grep vgaarb: > > [ 3.541405] pci 0000:0c:00.0: vgaarb: BAR 0: [mem > 0xa0000000-0xafffffff 64bit pref] contains firmware FB > [0xa0000000-0xa02fffff] > [ 3.901448] pci 0000:05:00.0: vgaarb: setting as boot VGA device > [ 3.905375] pci 0000:05:00.0: vgaarb: VGA device added: > decodes=io+mem,owns=none,locks=none > [ 3.905382] pci 0000:0c:00.0: vgaarb: setting as boot VGA device > (overriding previous) > [ 3.909375] pci 0000:0c:00.0: vgaarb: VGA device added: > decodes=io+mem,owns=io+mem,locks=none > [ 3.913375] pci 0000:0d:00.0: vgaarb: VGA device added: > decodes=io+mem,owns=none,locks=none > [ 3.913377] vgaarb: loaded > [ 13.513760] amdgpu 0000:0c:00.0: vgaarb: deactivate vga console > [ 19.020992] amdgpu 0000:0c:00.0: vgaarb: changed VGA decodes: > olddecodes=io+mem,decodes=none:owns=io+mem > > I'm using ubuntu 22.04 system, with ast.modeset=10 passed on the cmd > line, > I still be able to enter the graphics system. And views this GPU as a > render-only GPU. > Probably continue to examine what's wrong, except this, drm/amdgpu report > " *ERROR* IB test failed on sdma0 (-110)" to me. > > Does this count as problem? No, again that is perfectly expected behavior. Some BIOSes (or maybe most by modern standard) allows to override this, but if you later override this by the OS you run the hardware outside what's validated. When you put a VGA device into a board with an integrated VGA device the integrated one gets disabled. This is even part of some PCIe specification IIRC. So the problems you run into here are perfectly expected. Regards, Christian. > > Before I could find solution, I have keep this de-fact render only GPU > mounted. > Because I need recompile kennel module, install the kernel module and > testing. > > All I need is a 2D video card to display something, ast drm is OK, > despite simple. > It suit the need for my daily usage with VIM, that's enough for me. > > Now, the real questions that I want ask is: > > 1) > > Does the fact that when the kernel driver module got blocked (by > modprobe.blacklist=amdgpu), > while the vgaarb still select it as primary which leave the X server > crash there (because no kennel space driver loaded) > count as a problem? > > > 2) > > Does my approach that mounting another GPU as the primary display > adapter, > while its real purpose is to solving bugs and development for another > GPU, > count as a use case? > > > $ cat demsg.txt | grep drm > > [ 10.099888] ACPI: bus type drm_connector registered > [ 11.083920] etnaviv 0000:0d:00.0: [drm] bind etnaviv-display, > master name: 0000:0d:00.0 > [ 11.084106] [drm] Initialized etnaviv 1.3.0 20151214 for > 0000:0d:00.0 on minor 0 > [ 13.301702] [drm] amdgpu kernel modesetting enabled. > [ 13.359820] [drm] initializing kernel modesetting (NAVI12 > 0x1002:0x7360 0x1002:0x0A34 0xC7). > [ 13.368246] [drm] register mmio base: 0xEB100000 > [ 13.372861] [drm] register mmio size: 524288 > [ 13.380788] [drm] add ip block number 0 <nv_common> > [ 13.385661] [drm] add ip block number 1 <gmc_v10_0> > [ 13.390531] [drm] add ip block number 2 <navi10_ih> > [ 13.395405] [drm] add ip block number 3 <psp> > [ 13.399760] [drm] add ip block number 4 <smu> > [ 13.404111] [drm] add ip block number 5 <dm> > [ 13.408378] [drm] add ip block number 6 <gfx_v10_0> > [ 13.413249] [drm] add ip block number 7 <sdma_v5_0> > [ 13.433546] [drm] add ip block number 8 <vcn_v2_0> > [ 13.433547] [drm] add ip block number 9 <jpeg_v2_0> > [ 13.497757] [drm] VCN decode is enabled in VM mode > [ 13.502540] [drm] VCN encode is enabled in VM mode > [ 13.508785] [drm] JPEG decode is enabled in VM mode > [ 13.529596] [drm] vm size is 262144 GB, 4 levels, block size is > 9-bit, fragment size is 9-bit > [ 13.564762] [drm] Detected VRAM RAM=8176M, BAR=256M > [ 13.569628] [drm] RAM width 2048bits HBM > [ 13.574167] [drm] amdgpu: 8176M of VRAM memory ready > [ 13.579125] [drm] amdgpu: 15998M of GTT memory ready. > [ 13.584184] [drm] GART: num cpu pages 131072, num gpu pages 131072 > [ 13.590505] [drm] PCIE GART of 512M enabled (table at > 0x0000008000300000). > [ 13.598749] [drm] Found VCN firmware Version ENC: 1.16 DEC: 5 VEP: > 0 Revision: 4 > [ 13.671786] [drm] reserve 0xe00000 from 0x81fd000000 for PSP TMR > [ 13.801235] [drm] Display Core v3.2.247 initialized on DCN 2.0 > [ 13.807061] [drm] DP-HDMI FRL PCON supported > [ 13.832382] [drm] kiq ring mec 2 pipe 1 q 0 > [ 13.838131] [drm] VCN decode and encode initialized > successfully(under DPG Mode). > [ 13.845877] [drm] JPEG decode initialized successfully. > [ 14.072508] [drm] Initialized amdgpu 3.54.0 20150101 for > 0000:0c:00.0 on minor 1 > [ 14.080976] amdgpu 0000:0c:00.0: [drm] Cannot find any crtc or sizes > [ 14.087341] [drm] DSC precompute is not needed. > [ 16.487330] systemd[1]: Starting Load Kernel Module drm... > [ 619.901873] [drm] PCIE GART of 512M enabled (table at > 0x0000008000300000). > [ 619.901898] [drm] PSP is resuming... > [ 619.925307] [drm] reserve 0xe00000 from 0x81fd000000 for PSP TMR > [ 619.991034] [drm] psp gfx command AUTOLOAD_RLC(0x21) failed and > response status is (0xFFFF000D) > [ 620.294366] [drm] kiq ring mec 2 pipe 1 q 0 > [ 620.298953] [drm] VCN decode and encode initialized > successfully(under DPG Mode). > [ 620.299103] [drm] JPEG decode initialized successfully. > [ 621.309543] [drm:sdma_v5_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: > IB test timed out > [ 621.317577] amdgpu 0000:0c:00.0: [drm:amdgpu_ib_ring_tests > [amdgpu]] *ERROR* IB test failed on sdma0 (-110). > [ 622.333548] [drm:sdma_v5_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: > IB test timed out > [ 622.341587] amdgpu 0000:0c:00.0: [drm:amdgpu_ib_ring_tests > [amdgpu]] *ERROR* IB test failed on sdma1 (-110). > [ 622.354071] [drm:amdgpu_device_delayed_init_work_handler [amdgpu]] > *ERROR* ib ring test failed (-110). > [ 622.363721] amdgpu 0000:0c:00.0: [drm] Cannot find any crtc or sizes > > [1] https://www.techpowerup.com/gpu-specs/xfx-bc-160.b9346 > >
Hi, On 2023/9/7 17:08, Christian König wrote: > I strongly suggest that you just completely drop this here Drop this is OK, no problem. Then I will go to develop something else. This version is not intended to merge originally, as it's a RFC. Also, the core mechanism already finished, it is the first patch in this series. Things left are just policy (how to specify one and parse the kernel CMD line) and nothing interesting left. It is actually to fulfill my promise at V3 which is to give some examples as usage cases. > and go into the AST driver and try to fix it. Well, someone tell me that this is well defined behavior yesterday, which imply that it is not a bug. I'm not going to fix a non-bug. But if thomas ask me to fix it, then I probably have to try to fix. But I suggest if things not broken, don't fix it. Otherwise this may incur more big trouble. For server's single display use case, it is good enough. Thanks.
Am 07.09.23 um 18:33 schrieb suijingfeng: > Hi, > > > On 2023/9/7 17:08, Christian König wrote: > > >> I strongly suggest that you just completely drop this here > > > Drop this is OK, no problem. Then I will go to develop something else. > This version is not intended to merge originally, as it's a RFC. > Also, the core mechanism already finished, it is the first patch in > this series. > Things left are just policy (how to specify one and parse the kernel > CMD line) and nothing interesting left. > It is actually to fulfill my promise at V3 which is to give some > examples as usage cases. > > >> and go into the AST driver and try to fix it. > > Well, someone tell me that this is well defined behavior yesterday, > which imply that it is not a bug. I'm not going to fix a non-bug. Sorry for that, I wasn't realizing what you are actually trying to do. > But if thomas ask me to fix it, then I probably have to try to fix. > But I suggest if things not broken, don't fix it. Otherwise this may > incur more big trouble. For server's single display use case, it is > good enough. Yeah, exactly that's the reason why you shouldn't mess with this. In theory you could try to re-program the necessary north bridge blocks to make integrated graphics work even if you installed a dedicated VGA adapter, but you will most likely be missing something. The only real fix is to tell the BIOS that you want to use the integrated VGA device even if a dedicated one is detected. If you want to learn more about the background AMD has a bunch of documentation around this on their website: https://www.amd.com/en/search/documentation/hub.html The most interesting document for you is probably the BIOS programming manual, but don't ask me what exactly the title of that one. @Alex do you remember what that was called? IIRC Intel had similar documentations public, but I don't know where to find those of hand. Regards, Christian. > > > Thanks. >
From: Sui Jingfeng <suijingfeng@loongson.cn> On a machine with multiple GPUs, a Linux user has no control over which one is primary at boot time. This series tries to solve above mentioned problem by introduced the ->be_primary() function stub. The specific device drivers can provide an implementation to hook up with this stub by calling the vga_client_register() function. Once the driver bound the device successfully, VGAARB will call back to the device driver. To query if the device drivers want to be primary or not. Device drivers can just pass NULL if have no such needs. Please note that: 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would like to mount at least three video cards. 2) Typically, those non-86 machines don't have a good UEFI firmware support, which doesn't support select primary GPU as firmware stage. Even on x86, there are old UEFI firmwares which already made undesired decision for you. 3) This series is attempt to solve the remain problems at the driver level, while another series[1] of me is target to solve the majority of the problems at device level. Tested (limited) on x86 with four video card mounted, Intel UHD Graphics 630 is the default boot VGA, successfully override by ast2400 with ast.modeset=10 append at the kernel cmd line. $ lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XTX [Radeon HD 8490 / R5 235X OEM] 04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30) 05:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 720] (rev a1) $ sudo dmesg | grep vgaarb pci 0000:00:02.0: vgaarb: setting as boot VGA device pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none vgaarb: loaded ast 0000:04:00.0: vgaarb: Override as primary by driver i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem radeon 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none ast 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none v2: * Add a simple implemment for drm/i915 and drm/ast * Pick up all tags (Mario) v3: * Fix a mistake for drm/i915 implement * Fix patch can not be applied problem because of merge conflect. v4: * Focus on solve the real problem. v1,v2 at https://patchwork.freedesktop.org/series/120059/ v3 at https://patchwork.freedesktop.org/series/120562/ [1] https://patchwork.freedesktop.org/series/122845/ Sui Jingfeng (9): PCI/VGA: Allowing the user to select the primary video adapter at boot time drm/nouveau: Implement .be_primary() callback drm/radeon: Implement .be_primary() callback drm/amdgpu: Implement .be_primary() callback drm/i915: Implement .be_primary() callback drm/loongson: Implement .be_primary() callback drm/ast: Register as a VGA client by calling vga_client_register() drm/hibmc: Register as a VGA client by calling vga_client_register() drm/gma500: Register as a VGA client by calling vga_client_register() drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 13 ++++- drivers/gpu/drm/ast/ast_drv.c | 31 ++++++++++ drivers/gpu/drm/gma500/psb_drv.c | 57 ++++++++++++++++++- .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c | 15 +++++ drivers/gpu/drm/i915/display/intel_vga.c | 15 ++++- drivers/gpu/drm/loongson/loongson_module.c | 2 +- drivers/gpu/drm/loongson/loongson_module.h | 1 + drivers/gpu/drm/loongson/lsdc_drv.c | 10 +++- drivers/gpu/drm/nouveau/nouveau_vga.c | 11 +++- drivers/gpu/drm/radeon/radeon_device.c | 10 +++- drivers/pci/vgaarb.c | 43 ++++++++++++-- drivers/vfio/pci/vfio_pci_core.c | 2 +- include/linux/vgaarb.h | 8 ++- 14 files changed, 210 insertions(+), 19 deletions(-)