diff mbox series

[RFC,drm-misc-next,v4,3/9] drm/radeon: Implement .be_primary() callback

Message ID 20230904195724.633404-4-sui.jingfeng@linux.dev (mailing list archive)
State Not Applicable
Delegated to: Bjorn Helgaas
Headers show
Series PCI/VGA: Allowing the user to select the primary video adapter at boot time | expand

Commit Message

Sui Jingfeng Sept. 4, 2023, 7:57 p.m. UTC
From: Sui Jingfeng <suijingfeng@loongson.cn>

On a machine with multiple GPUs, a Linux user has no control over which one
is primary at boot time. This patch tries to solve the mentioned problem by
implementing the .be_primary() callback. Pass radeon.modeset=10 on the
kernel cmd line if you really want the device bound by radeon to be the
primary video adapter, no matter what VGAARB say.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
---
 drivers/gpu/drm/radeon/radeon_device.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Christian König Sept. 5, 2023, 5:50 a.m. UTC | #1
Am 04.09.23 um 21:57 schrieb Sui Jingfeng:
> From: Sui Jingfeng <suijingfeng@loongson.cn>
>
> On a machine with multiple GPUs, a Linux user has no control over which one
> is primary at boot time.

Question is why is that useful? Should we give users the ability to 
control that?

I don't see an use case for this.

Regards,
Christian.

>   This patch tries to solve the mentioned problem by
> implementing the .be_primary() callback. Pass radeon.modeset=10 on the
> kernel cmd line if you really want the device bound by radeon to be the
> primary video adapter, no matter what VGAARB say.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
> ---
>   drivers/gpu/drm/radeon/radeon_device.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index 71f2ff39d6a1..b661cd3a8dc2 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1263,6 +1263,14 @@ static const struct vga_switcheroo_client_ops radeon_switcheroo_ops = {
>   	.can_switch = radeon_switcheroo_can_switch,
>   };
>   
> +static bool radeon_want_to_be_primary(struct pci_dev *pdev)
> +{
> +	if (radeon_modeset == 10)
> +		return true;
> +
> +	return false;
> +}
> +
>   /**
>    * radeon_device_init - initialize the driver
>    *
> @@ -1425,7 +1433,7 @@ int radeon_device_init(struct radeon_device *rdev,
>   	/* if we have > 1 VGA cards, then disable the radeon VGA resources */
>   	/* this will fail for cards that aren't VGA class devices, just
>   	 * ignore it */
> -	vga_client_register(rdev->pdev, radeon_vga_set_decode, NULL);
> +	vga_client_register(rdev->pdev, radeon_vga_set_decode, radeon_want_to_be_primary);
>   
>   	if (rdev->flags & RADEON_IS_PX)
>   		runtime = true;
Sui Jingfeng Sept. 5, 2023, 5:24 p.m. UTC | #2
Hi,


On 2023/9/5 13:50, Christian König wrote:
> Am 04.09.23 um 21:57 schrieb Sui Jingfeng:
>> From: Sui Jingfeng <suijingfeng@loongson.cn>
>>
>> On a machine with multiple GPUs, a Linux user has no control over 
>> which one
>> is primary at boot time.
>
> Question is why is that useful? Should we give users the ability to 
> control that?
>
> I don't see an use case for this.
>

On a specific machine with multiple GPUs mounted, only the
primary graphics get POST-ed (initialized) by the firmware.
Therefore the DRM drivers for the rest video cards have to
work without the prerequisite setups done by firmware, This
is called as POST.

One of the use cases is to test if a specific DRM driver
would works properly, under the circumstance of not being
POST-ed, The ast drm driver is the first one which refused
to work if not being POST-ed by the firmware.

Before apply this series, I was unable make drm/ast as the
primary video card easily. The problem is that on a multiple
video card configuration, the monitor connected with my
AST2400 card not light up. While confusing, a naive programmer
may suspect the PRIME is not working.

After applied this series and passing ast.modeset=10 on the
kernel cmd line, I found that the monitor connected with my
ast2400 video card still black, It doesn't display and It
doesn't show image to me.

While in the process of study drm/ast, I know that drm/ast
driver has the POST code shipped, See the ast_post_gpu() function.
Then, I was wondering why this function doesn't works.

After a short-time (hasty) debugging, I found that the ast_post_gpu()
function didn't get run. Because it have something to do with the
ast->config_mode. Without thinking too much, I hardcoded the
ast->config_mode as ast_use_p2a, the key point is to force the
ast_post_gpu() function to run.


```

--- a/drivers/gpu/drm/ast/ast_main.c
+++ b/drivers/gpu/drm/ast/ast_main.c
@@ -132,6 +132,8 @@ static int ast_device_config_init(struct ast_device 
*ast)
                 }
         }

+       ast->config_mode = ast_use_p2a;
+
         switch (ast->config_mode) {
         case ast_use_defaults:
                 drm_info(dev, "Using default configuration\n");

```

Then, the monitor light up, it display the Ubuntu greeter to me. Therefore
my patch is useful, at least for the Linux drm driver tester and developer.
It allow programmers to test the specific part of a specific driver without
changing a line of the source code and without the need of sudo authority.

It improves the efficiency of the testing and patch verification. I know
the PrimaryGPU option of Xorg conf, but this approach will remember the
setup have been made, you need modify it with root authority each time
you want to switch the primary. But on the process of rapid developing
and/or testing for multiple video drivers, with only one computer hardware
resource available. What we really want is a one-shot command, as provided
by this series.  So, this is the first use case.


The second use case is that sometime the firmware is not reliable.
While there are thousands of ARM64, PowerPC and Mips servers machine,
Most of them don't have a good UEFI firmware support. I haven't test the
drm/amdgpu and drm/radeon at my ARM64 server yet. Because this ARM64
server always use the platform(BMC) integrated display controller as primary.
The UEFI firmware of it does not provide options menu to tune.
So, for the first time, the discrete card because useless, despite more powerful.
I will take time to carry on the testing, so I will be able to tell more
in the future.


Even on X86, when select the PEG as primary on the UEFI BIOS menu.
There is no way to tell the bios which one of my three
discrete video be the primary. Not to mention some old UEFI
firmware, which doesn't provide a setting at all.
While the benefit of my approach is the flexibility.
Yes the i915, amdgpu and radeon are good quality,
but there may have programmers want to try nouveau.


The third use case is that VGAARB is also not reliable, It will
select a wrong device as primary. Especially on Arm64, Loongarch
and mips arch etc. And the X server will use this wrong device
as primary and completely crash there. Either because of lacking
a driver or the driver has a bug which can not bear the graphic
environment up. VGAARB is firmware dependent.
My patch provide a temporary method to rescue.


The forth is probably the PRIME and reverse PRIME development
and developing driver for new video cards.
Alex Deucher Sept. 6, 2023, 4 p.m. UTC | #3
On Tue, Sep 5, 2023 at 1:25 PM suijingfeng <suijingfeng@loongson.cn> wrote:
>
> Hi,
>
>
> On 2023/9/5 13:50, Christian König wrote:
> > Am 04.09.23 um 21:57 schrieb Sui Jingfeng:
> >> From: Sui Jingfeng <suijingfeng@loongson.cn>
> >>
> >> On a machine with multiple GPUs, a Linux user has no control over
> >> which one
> >> is primary at boot time.
> >
> > Question is why is that useful? Should we give users the ability to
> > control that?
> >
> > I don't see an use case for this.
> >
>
> On a specific machine with multiple GPUs mounted, only the
> primary graphics get POST-ed (initialized) by the firmware.
> Therefore the DRM drivers for the rest video cards have to
> work without the prerequisite setups done by firmware, This
> is called as POST.

I think that should be regarded as a bug in the driver that should be
fixed and this would not help with that case.  If a driver can't
initialize a device without aid from the pre-OS environment, that
should be fixed in the driver.  This solution also doesn't fix which
device is selected as the primary by the pre-OS environment.  That can
only be fixed in the pre-OS environment code.

>
> One of the use cases is to test if a specific DRM driver
> would works properly, under the circumstance of not being
> POST-ed, The ast drm driver is the first one which refused
> to work if not being POST-ed by the firmware.
>
> Before apply this series, I was unable make drm/ast as the
> primary video card easily. The problem is that on a multiple
> video card configuration, the monitor connected with my
> AST2400 card not light up. While confusing, a naive programmer
> may suspect the PRIME is not working.
>
> After applied this series and passing ast.modeset=10 on the
> kernel cmd line, I found that the monitor connected with my
> ast2400 video card still black, It doesn't display and It
> doesn't show image to me.

The problem with adding modeset=10 is that it only helps when you have
one GPU driven by that driver in the system.  If you have multiple
GPUs driven by that driver, which one would that apply to?  E.g., what
if you have 2 AMD GPUs in the system.

>
> While in the process of study drm/ast, I know that drm/ast
> driver has the POST code shipped, See the ast_post_gpu() function.
> Then, I was wondering why this function doesn't works.
>
> After a short-time (hasty) debugging, I found that the ast_post_gpu()
> function didn't get run. Because it have something to do with the
> ast->config_mode. Without thinking too much, I hardcoded the
> ast->config_mode as ast_use_p2a, the key point is to force the
> ast_post_gpu() function to run.
>
>
> ```
>
> --- a/drivers/gpu/drm/ast/ast_main.c
> +++ b/drivers/gpu/drm/ast/ast_main.c
> @@ -132,6 +132,8 @@ static int ast_device_config_init(struct ast_device
> *ast)
>                  }
>          }
>
> +       ast->config_mode = ast_use_p2a;
> +
>          switch (ast->config_mode) {
>          case ast_use_defaults:
>                  drm_info(dev, "Using default configuration\n");
>
> ```
>
> Then, the monitor light up, it display the Ubuntu greeter to me. Therefore
> my patch is useful, at least for the Linux drm driver tester and developer.
> It allow programmers to test the specific part of a specific driver without
> changing a line of the source code and without the need of sudo authority.
>
> It improves the efficiency of the testing and patch verification. I know
> the PrimaryGPU option of Xorg conf, but this approach will remember the
> setup have been made, you need modify it with root authority each time
> you want to switch the primary. But on the process of rapid developing
> and/or testing for multiple video drivers, with only one computer hardware
> resource available. What we really want is a one-shot command, as provided
> by this series.  So, this is the first use case.
>
>
> The second use case is that sometime the firmware is not reliable.
> While there are thousands of ARM64, PowerPC and Mips servers machine,
> Most of them don't have a good UEFI firmware support. I haven't test the
> drm/amdgpu and drm/radeon at my ARM64 server yet. Because this ARM64
> server always use the platform(BMC) integrated display controller as primary.
> The UEFI firmware of it does not provide options menu to tune.
> So, for the first time, the discrete card because useless, despite more powerful.
> I will take time to carry on the testing, so I will be able to tell more
> in the future.
>
>
> Even on X86, when select the PEG as primary on the UEFI BIOS menu.
> There is no way to tell the bios which one of my three
> discrete video be the primary. Not to mention some old UEFI
> firmware, which doesn't provide a setting at all.
> While the benefit of my approach is the flexibility.
> Yes the i915, amdgpu and radeon are good quality,
> but there may have programmers want to try nouveau.
>
>
> The third use case is that VGAARB is also not reliable, It will
> select a wrong device as primary. Especially on Arm64, Loongarch
> and mips arch etc. And the X server will use this wrong device
> as primary and completely crash there. Either because of lacking
> a driver or the driver has a bug which can not bear the graphic
> environment up. VGAARB is firmware dependent.
> My patch provide a temporary method to rescue.
>

It sounds like we need a general purpose "primary" selector.  I think
it's sort of orthogonal to VGA.  VGAARB is just for managing VGA
routing on PCI.  It's not really directly related to which GPU you
want to be the primary when the OS loads.  Maybe some new kernel
parameter, E.g., primary_display=<string> where the string would be
the specific device you wanted to be the primary,  E.g., you could use
the PCI BDF on PCI devices, primary_display=0000:0a:00.0 or some other
device string for non-PCI devices.

Alex
Sui Jingfeng Sept. 7, 2023, 1:40 a.m. UTC | #4
On 2023/9/7 00:00, Alex Deucher wrote:
> On Tue, Sep 5, 2023 at 1:25 PM suijingfeng <suijingfeng@loongson.cn> wrote:
>> Hi,
>>
>>
>> On 2023/9/5 13:50, Christian König wrote:
>>> Am 04.09.23 um 21:57 schrieb Sui Jingfeng:
>>>> From: Sui Jingfeng <suijingfeng@loongson.cn>
>>>>
>>>> On a machine with multiple GPUs, a Linux user has no control over
>>>> which one
>>>> is primary at boot time.
>>> Question is why is that useful? Should we give users the ability to
>>> control that?
>>>
>>> I don't see an use case for this.
>>>
>> On a specific machine with multiple GPUs mounted, only the
>> primary graphics get POST-ed (initialized) by the firmware.
>> Therefore the DRM drivers for the rest video cards have to
>> work without the prerequisite setups done by firmware, This
>> is called as POST.
> I think that should be regarded as a bug in the driver that should be
> fixed and this would not help with that case.  If a driver can't
> initialize a device without aid from the pre-OS environment, that
> should be fixed in the driver.  This solution also doesn't fix which
> device is selected as the primary by the pre-OS environment.  That can
> only be fixed in the pre-OS environment code.
>
>> One of the use cases is to test if a specific DRM driver
>> would works properly, under the circumstance of not being
>> POST-ed, The ast drm driver is the first one which refused
>> to work if not being POST-ed by the firmware.
>>
>> Before apply this series, I was unable make drm/ast as the
>> primary video card easily. The problem is that on a multiple
>> video card configuration, the monitor connected with my
>> AST2400 card not light up. While confusing, a naive programmer
>> may suspect the PRIME is not working.
>>
>> After applied this series and passing ast.modeset=10 on the
>> kernel cmd line, I found that the monitor connected with my
>> ast2400 video card still black, It doesn't display and It
>> doesn't show image to me.
> The problem with adding modeset=10 is that it only helps when you have
> one GPU driven by that driver in the system.  If you have multiple
> GPUs driven by that driver, which one would that apply to?  E.g., what
> if you have 2 AMD GPUs in the system.
>
>> While in the process of study drm/ast, I know that drm/ast
>> driver has the POST code shipped, See the ast_post_gpu() function.
>> Then, I was wondering why this function doesn't works.
>>
>> After a short-time (hasty) debugging, I found that the ast_post_gpu()
>> function didn't get run. Because it have something to do with the
>> ast->config_mode. Without thinking too much, I hardcoded the
>> ast->config_mode as ast_use_p2a, the key point is to force the
>> ast_post_gpu() function to run.
>>
>>
>> ```
>>
>> --- a/drivers/gpu/drm/ast/ast_main.c
>> +++ b/drivers/gpu/drm/ast/ast_main.c
>> @@ -132,6 +132,8 @@ static int ast_device_config_init(struct ast_device
>> *ast)
>>                   }
>>           }
>>
>> +       ast->config_mode = ast_use_p2a;
>> +
>>           switch (ast->config_mode) {
>>           case ast_use_defaults:
>>                   drm_info(dev, "Using default configuration\n");
>>
>> ```
>>
>> Then, the monitor light up, it display the Ubuntu greeter to me. Therefore
>> my patch is useful, at least for the Linux drm driver tester and developer.
>> It allow programmers to test the specific part of a specific driver without
>> changing a line of the source code and without the need of sudo authority.
>>
>> It improves the efficiency of the testing and patch verification. I know
>> the PrimaryGPU option of Xorg conf, but this approach will remember the
>> setup have been made, you need modify it with root authority each time
>> you want to switch the primary. But on the process of rapid developing
>> and/or testing for multiple video drivers, with only one computer hardware
>> resource available. What we really want is a one-shot command, as provided
>> by this series.  So, this is the first use case.
>>
>>
>> The second use case is that sometime the firmware is not reliable.
>> While there are thousands of ARM64, PowerPC and Mips servers machine,
>> Most of them don't have a good UEFI firmware support. I haven't test the
>> drm/amdgpu and drm/radeon at my ARM64 server yet. Because this ARM64
>> server always use the platform(BMC) integrated display controller as primary.
>> The UEFI firmware of it does not provide options menu to tune.
>> So, for the first time, the discrete card because useless, despite more powerful.
>> I will take time to carry on the testing, so I will be able to tell more
>> in the future.
>>
>>
>> Even on X86, when select the PEG as primary on the UEFI BIOS menu.
>> There is no way to tell the bios which one of my three
>> discrete video be the primary. Not to mention some old UEFI
>> firmware, which doesn't provide a setting at all.
>> While the benefit of my approach is the flexibility.
>> Yes the i915, amdgpu and radeon are good quality,
>> but there may have programmers want to try nouveau.
>>
>>
>> The third use case is that VGAARB is also not reliable, It will
>> select a wrong device as primary. Especially on Arm64, Loongarch
>> and mips arch etc. And the X server will use this wrong device
>> as primary and completely crash there. Either because of lacking
>> a driver or the driver has a bug which can not bear the graphic
>> environment up. VGAARB is firmware dependent.
>> My patch provide a temporary method to rescue.
>>
> It sounds like we need a general purpose "primary" selector.  I think
> it's sort of orthogonal to VGA.  VGAARB is just for managing VGA
> routing on PCI.  It's not really directly related to which GPU you
> want to be the primary when the OS loads.  Maybe some new kernel
> parameter, E.g., primary_display=<string> where the string would be
> the specific device you wanted to be the primary,  E.g., you could use
> the PCI BDF on PCI devices, primary_display=0000:0a:00.0 or some other
> device string for non-PCI devices.


Indeed, thanks for you give the right direction.
Bjorn and Willianson also told me something similar,
but I didn't realized until today.

historically, VGAARB is only for managing VGA compatible device.
For legacy BIOS environment, the routing related to this need to handled.
I didn't realized those technique point by the time this series is sending.
Beside this, It not uncommon that a specific machine ship multiple identical
GPUs models Or GPUs from the same company (likely a integrated one and a discrete one)
My naive approach in v4 cannot solve complex problems like this.
I will try to solve this problems and concerns at the next version.

Thanks.


> Alex
diff mbox series

Patch

diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 71f2ff39d6a1..b661cd3a8dc2 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1263,6 +1263,14 @@  static const struct vga_switcheroo_client_ops radeon_switcheroo_ops = {
 	.can_switch = radeon_switcheroo_can_switch,
 };
 
+static bool radeon_want_to_be_primary(struct pci_dev *pdev)
+{
+	if (radeon_modeset == 10)
+		return true;
+
+	return false;
+}
+
 /**
  * radeon_device_init - initialize the driver
  *
@@ -1425,7 +1433,7 @@  int radeon_device_init(struct radeon_device *rdev,
 	/* if we have > 1 VGA cards, then disable the radeon VGA resources */
 	/* this will fail for cards that aren't VGA class devices, just
 	 * ignore it */
-	vga_client_register(rdev->pdev, radeon_vga_set_decode, NULL);
+	vga_client_register(rdev->pdev, radeon_vga_set_decode, radeon_want_to_be_primary);
 
 	if (rdev->flags & RADEON_IS_PX)
 		runtime = true;