diff mbox series

Revert "drm/amdgpu: init iommu after amdkfd device init"

Message ID 20240523173031.4212-1-W_Armin@gmx.de (mailing list archive)
State New, archived
Headers show
Series Revert "drm/amdgpu: init iommu after amdkfd device init" | expand

Commit Message

Armin Wolf May 23, 2024, 5:30 p.m. UTC
This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.

A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the problematic
commit and verified that by reverting it the notebook works again.
He also confirmed that kernel 6.8.1 also works on his device, so the
upstream commit itself seems to be ok.

An amdgpu developer (Alex Deucher) confirmed that this patch should
have never been ported to 5.15 in the first place, so revert this
commit from the 5.15 stable series.

Reported-by: Barry Kauler <bkauler@gmail.com>
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--
2.39.2

Comments

Armin Wolf June 3, 2024, 10:19 p.m. UTC | #1
Am 23.05.24 um 19:30 schrieb Armin Wolf:

> This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.
>
> A user reported that this commit breaks the integrated gpu of his
> notebook, causing a black screen. He was able to bisect the problematic
> commit and verified that by reverting it the notebook works again.
> He also confirmed that kernel 6.8.1 also works on his device, so the
> upstream commit itself seems to be ok.
>
> An amdgpu developer (Alex Deucher) confirmed that this patch should
> have never been ported to 5.15 in the first place, so revert this
> commit from the 5.15 stable series.

Hi,

what is the status of this?

Armin Wolf

>
> Reported-by: Barry Kauler <bkauler@gmail.com>
> Signed-off-by: Armin Wolf <W_Armin@gmx.de>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 222a1d9ecf16..5f6c32ec674d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>   	if (r)
>   		goto init_failed;
>
> +	r = amdgpu_amdkfd_resume_iommu(adev);
> +	if (r)
> +		goto init_failed;
> +
>   	r = amdgpu_device_ip_hw_init_phase1(adev);
>   	if (r)
>   		goto init_failed;
> @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>   	if (!adev->gmc.xgmi.pending_reset)
>   		amdgpu_amdkfd_device_init(adev);
>
> -	r = amdgpu_amdkfd_resume_iommu(adev);
> -	if (r)
> -		goto init_failed;
> -
>   	amdgpu_fru_get_product_info(adev);
>
>   init_failed:
> --
> 2.39.2
>
>
Felix Kuehling June 4, 2024, 6:24 p.m. UTC | #2
On 2024-06-03 18:19, Armin Wolf wrote:
> Am 23.05.24 um 19:30 schrieb Armin Wolf:
>
>> This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.
>>
>> A user reported that this commit breaks the integrated gpu of his
>> notebook, causing a black screen. He was able to bisect the problematic
>> commit and verified that by reverting it the notebook works again.
>> He also confirmed that kernel 6.8.1 also works on his device, so the
>> upstream commit itself seems to be ok.
>>
>> An amdgpu developer (Alex Deucher) confirmed that this patch should
>> have never been ported to 5.15 in the first place, so revert this
>> commit from the 5.15 stable series.
>
> Hi,
>
> what is the status of this?

Which branch is this for? This patch won't apply to anything after Linux 
6.5. Support for IOMMUv2 was removed from amdgpu in Linux 6.6 by:

commit c99a2e7ae291e5b19b60443eb6397320ef9e8571
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Fri Jul 28 12:20:12 2023 -0400

     drm/amdkfd: drop IOMMUv2 support

     Now that we use the dGPU path for all APUs, drop the
     IOMMUv2 support.

     v2: drop the now unused queue manager functions for gfx7/8 APUs

     Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
     Acked-by: Christian König <christian.koenig@amd.com>
     Tested-by: Mike Lothian <mike@fireburn.co.uk>
     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Regards,
   Felix


>
> Armin Wolf
>
>>
>> Reported-by: Barry Kauler <bkauler@gmail.com>
>> Signed-off-by: Armin Wolf <W_Armin@gmx.de>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 222a1d9ecf16..5f6c32ec674d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct 
>> amdgpu_device *adev)
>>       if (r)
>>           goto init_failed;
>>
>> +    r = amdgpu_amdkfd_resume_iommu(adev);
>> +    if (r)
>> +        goto init_failed;
>> +
>>       r = amdgpu_device_ip_hw_init_phase1(adev);
>>       if (r)
>>           goto init_failed;
>> @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct 
>> amdgpu_device *adev)
>>       if (!adev->gmc.xgmi.pending_reset)
>>           amdgpu_amdkfd_device_init(adev);
>>
>> -    r = amdgpu_amdkfd_resume_iommu(adev);
>> -    if (r)
>> -        goto init_failed;
>> -
>>       amdgpu_fru_get_product_info(adev);
>>
>>   init_failed:
>> -- 
>> 2.39.2
>>
>>
Alex Deucher June 4, 2024, 6:28 p.m. UTC | #3
[AMD Official Use Only - AMD Internal Distribution Only]

> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling@amd.com>
> Sent: Tuesday, June 4, 2024 2:25 PM
> To: Armin Wolf <W_Armin@gmx.de>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>;
> gregkh@linuxfoundation.org; sashal@kernel.org
> Cc: stable@vger.kernel.org; bkauler@gmail.com; Zhang, Yifan
> <Yifan1.Zhang@amd.com>; Liang, Prike <Prike.Liang@amd.com>; dri-
> devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device
> init"
>
>
> On 2024-06-03 18:19, Armin Wolf wrote:
> > Am 23.05.24 um 19:30 schrieb Armin Wolf:
> >
> >> This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.
> >>
> >> A user reported that this commit breaks the integrated gpu of his
> >> notebook, causing a black screen. He was able to bisect the
> >> problematic commit and verified that by reverting it the notebook works
> again.
> >> He also confirmed that kernel 6.8.1 also works on his device, so the
> >> upstream commit itself seems to be ok.
> >>
> >> An amdgpu developer (Alex Deucher) confirmed that this patch should
> >> have never been ported to 5.15 in the first place, so revert this
> >> commit from the 5.15 stable series.
> >
> > Hi,
> >
> > what is the status of this?
>
> Which branch is this for? This patch won't apply to anything after Linux 6.5.

It's applicable to 5.15 stable only.  The original patch caused a regression on 5.15 so probably should not have been applied there.

Alex


> Support for IOMMUv2 was removed from amdgpu in Linux 6.6 by:
>
> commit c99a2e7ae291e5b19b60443eb6397320ef9e8571
> Author: Alex Deucher <alexander.deucher@amd.com>
> Date:   Fri Jul 28 12:20:12 2023 -0400
>
>      drm/amdkfd: drop IOMMUv2 support
>
>      Now that we use the dGPU path for all APUs, drop the
>      IOMMUv2 support.
>
>      v2: drop the now unused queue manager functions for gfx7/8 APUs
>
>      Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
>      Acked-by: Christian König <christian.koenig@amd.com>
>      Tested-by: Mike Lothian <mike@fireburn.co.uk>
>      Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>
> Regards,
>    Felix
>
>
> >
> > Armin Wolf
> >
> >>
> >> Reported-by: Barry Kauler <bkauler@gmail.com>
> >> Signed-off-by: Armin Wolf <W_Armin@gmx.de>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++----
> >>   1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> index 222a1d9ecf16..5f6c32ec674d 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct
> >> amdgpu_device *adev)
> >>       if (r)
> >>           goto init_failed;
> >>
> >> +    r = amdgpu_amdkfd_resume_iommu(adev);
> >> +    if (r)
> >> +        goto init_failed;
> >> +
> >>       r = amdgpu_device_ip_hw_init_phase1(adev);
> >>       if (r)
> >>           goto init_failed;
> >> @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct
> >> amdgpu_device *adev)
> >>       if (!adev->gmc.xgmi.pending_reset)
> >>           amdgpu_amdkfd_device_init(adev);
> >>
> >> -    r = amdgpu_amdkfd_resume_iommu(adev);
> >> -    if (r)
> >> -        goto init_failed;
> >> -
> >>       amdgpu_fru_get_product_info(adev);
> >>
> >>   init_failed:
> >> --
> >> 2.39.2
> >>
> >>
Armin Wolf June 10, 2024, 2:28 p.m. UTC | #4
Am 04.06.24 um 20:28 schrieb Deucher, Alexander:

> [AMD Official Use Only - AMD Internal Distribution Only]
>
>> -----Original Message-----
>> From: Kuehling, Felix <Felix.Kuehling@amd.com>
>> Sent: Tuesday, June 4, 2024 2:25 PM
>> To: Armin Wolf <W_Armin@gmx.de>; Deucher, Alexander
>> <Alexander.Deucher@amd.com>; Koenig, Christian
>> <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>;
>> gregkh@linuxfoundation.org; sashal@kernel.org
>> Cc: stable@vger.kernel.org; bkauler@gmail.com; Zhang, Yifan
>> <Yifan1.Zhang@amd.com>; Liang, Prike <Prike.Liang@amd.com>; dri-
>> devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device
>> init"
>>
>>
>> On 2024-06-03 18:19, Armin Wolf wrote:
>>> Am 23.05.24 um 19:30 schrieb Armin Wolf:
>>>
>>>> This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.
>>>>
>>>> A user reported that this commit breaks the integrated gpu of his
>>>> notebook, causing a black screen. He was able to bisect the
>>>> problematic commit and verified that by reverting it the notebook works
>> again.
>>>> He also confirmed that kernel 6.8.1 also works on his device, so the
>>>> upstream commit itself seems to be ok.
>>>>
>>>> An amdgpu developer (Alex Deucher) confirmed that this patch should
>>>> have never been ported to 5.15 in the first place, so revert this
>>>> commit from the 5.15 stable series.
>>> Hi,
>>>
>>> what is the status of this?
>> Which branch is this for? This patch won't apply to anything after Linux 6.5.
> It's applicable to 5.15 stable only.  The original patch caused a regression on 5.15 so probably should not have been applied there.
>
> Alex
>
Correct, and i would be very grateful if this regression could be resolved in the near future.
The user already wrote a blog post about the whole issue, see here:

https://bkhome.org/news/202405/kernel-amd-gpu-disaster-fixed.html

Thanks,
Armin Wolf

>> Support for IOMMUv2 was removed from amdgpu in Linux 6.6 by:
>>
>> commit c99a2e7ae291e5b19b60443eb6397320ef9e8571
>> Author: Alex Deucher <alexander.deucher@amd.com>
>> Date:   Fri Jul 28 12:20:12 2023 -0400
>>
>>       drm/amdkfd: drop IOMMUv2 support
>>
>>       Now that we use the dGPU path for all APUs, drop the
>>       IOMMUv2 support.
>>
>>       v2: drop the now unused queue manager functions for gfx7/8 APUs
>>
>>       Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>       Acked-by: Christian König <christian.koenig@amd.com>
>>       Tested-by: Mike Lothian <mike@fireburn.co.uk>
>>       Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>
>> Regards,
>>     Felix
>>
>>
>>> Armin Wolf
>>>
>>>> Reported-by: Barry Kauler <bkauler@gmail.com>
>>>> Signed-off-by: Armin Wolf <W_Armin@gmx.de>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++----
>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index 222a1d9ecf16..5f6c32ec674d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct
>>>> amdgpu_device *adev)
>>>>        if (r)
>>>>            goto init_failed;
>>>>
>>>> +    r = amdgpu_amdkfd_resume_iommu(adev);
>>>> +    if (r)
>>>> +        goto init_failed;
>>>> +
>>>>        r = amdgpu_device_ip_hw_init_phase1(adev);
>>>>        if (r)
>>>>            goto init_failed;
>>>> @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct
>>>> amdgpu_device *adev)
>>>>        if (!adev->gmc.xgmi.pending_reset)
>>>>            amdgpu_amdkfd_device_init(adev);
>>>>
>>>> -    r = amdgpu_amdkfd_resume_iommu(adev);
>>>> -    if (r)
>>>> -        goto init_failed;
>>>> -
>>>>        amdgpu_fru_get_product_info(adev);
>>>>
>>>>    init_failed:
>>>> --
>>>> 2.39.2
>>>>
>>>>
Matthew Ruffell June 12, 2024, 12:10 a.m. UTC | #5
Hi Greg KH, Sasha,

Please pick up this patch for 5.15 stable tree. I have built a test kernel and
can confirm that it fixes affected users.

Downstream bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738

Thanks,
Matthew
Greg KH June 12, 2024, 12:44 p.m. UTC | #6
On Wed, Jun 12, 2024 at 12:10:37PM +1200, Matthew Ruffell wrote:
> Hi Greg KH, Sasha,
> 
> Please pick up this patch for 5.15 stable tree. I have built a test kernel and
> can confirm that it fixes affected users.
> 
> Downstream bug:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738

Sorry for the delay, now picked up.

greg k-h
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 222a1d9ecf16..5f6c32ec674d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2487,6 +2487,10 @@  static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 	if (r)
 		goto init_failed;

+	r = amdgpu_amdkfd_resume_iommu(adev);
+	if (r)
+		goto init_failed;
+
 	r = amdgpu_device_ip_hw_init_phase1(adev);
 	if (r)
 		goto init_failed;
@@ -2525,10 +2529,6 @@  static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 	if (!adev->gmc.xgmi.pending_reset)
 		amdgpu_amdkfd_device_init(adev);

-	r = amdgpu_amdkfd_resume_iommu(adev);
-	if (r)
-		goto init_failed;
-
 	amdgpu_fru_get_product_info(adev);

 init_failed: