mbox series

[0/3] Fix Navi3x boot and hotplug problems

Message ID 20230926225955.386553-1-mario.limonciello@amd.com (mailing list archive)
Headers show
Series Fix Navi3x boot and hotplug problems | expand

Message

Mario Limonciello Sept. 26, 2023, 10:59 p.m. UTC
On some OEM systems multiple navi3x dGPUS are triggering RAS errors
and BACO errors.

These errors come from elements of the OEM system that weren't part of
original test environment.  This series addresses those problems.

NOTE: Although this series touches two subsystems, I would prefer to
take this all through DRM because there is a workaround in linux-next
that I would like to be reverted at the same time as picking up the first
two patches.

Mario Limonciello (3):
  drm/amd: Fix detection of _PR3 on the PCIe root port
  power: supply: Don't count 'unknown' scope power supplies
  Revert "drm/amd/pm: workaround for the wrong ac power detection on smu
    13.0.0"

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c           | 2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c       | 3 ++-
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 1 +
 drivers/power/supply/power_supply_core.c             | 2 +-
 4 files changed, 5 insertions(+), 3 deletions(-)

Comments

Alex Deucher Sept. 28, 2023, 6 p.m. UTC | #1
On Thu, Sep 28, 2023 at 12:41 PM Mario Limonciello
<mario.limonciello@amd.com> wrote:
>
> On some OEM systems multiple navi3x dGPUS are triggering RAS errors
> and BACO errors.
>
> These errors come from elements of the OEM system that weren't part of
> original test environment.  This series addresses those problems.
>
> NOTE: Although this series touches two subsystems, I would prefer to
> take this all through DRM because there is a workaround in linux-next
> that I would like to be reverted at the same time as picking up the first
> two patches.

FWIW, the workaround is not in linux-next yet.  At the time I thought
it was already fixed by the fixes in ucsi and power supply when we
first encountered this.

Alex

>
> Mario Limonciello (3):
>   drm/amd: Fix detection of _PR3 on the PCIe root port
>   power: supply: Don't count 'unknown' scope power supplies
>   Revert "drm/amd/pm: workaround for the wrong ac power detection on smu
>     13.0.0"
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c           | 2 +-
>  drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c       | 3 ++-
>  drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 1 +
>  drivers/power/supply/power_supply_core.c             | 2 +-
>  4 files changed, 5 insertions(+), 3 deletions(-)
>
> --
> 2.34.1
>
Mario Limonciello Sept. 28, 2023, 8:01 p.m. UTC | #2
On 9/28/2023 13:00, Alex Deucher wrote:
> On Thu, Sep 28, 2023 at 12:41 PM Mario Limonciello
> <mario.limonciello@amd.com> wrote:
>>
>> On some OEM systems multiple navi3x dGPUS are triggering RAS errors
>> and BACO errors.
>>
>> These errors come from elements of the OEM system that weren't part of
>> original test environment.  This series addresses those problems.
>>
>> NOTE: Although this series touches two subsystems, I would prefer to
>> take this all through DRM because there is a workaround in linux-next
>> that I would like to be reverted at the same time as picking up the first
>> two patches.
> 
> FWIW, the workaround is not in linux-next yet.  At the time I thought
> it was already fixed by the fixes in ucsi and power supply when we
> first encountered this.

I looked yesterday and I did see it there, but I think it was 
specifically because it had merged the amd-staging-drm-next tree.
It's not there today..

If Sebastian is OK, I'd still rather keep it all together so that people 
testing amd-staging-drm-next get the fixes.

> 
> Alex
> 
>>
>> Mario Limonciello (3):
>>    drm/amd: Fix detection of _PR3 on the PCIe root port
>>    power: supply: Don't count 'unknown' scope power supplies
>>    Revert "drm/amd/pm: workaround for the wrong ac power detection on smu
>>      13.0.0"
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c           | 2 +-
>>   drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c       | 3 ++-
>>   drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 1 +
>>   drivers/power/supply/power_supply_core.c             | 2 +-
>>   4 files changed, 5 insertions(+), 3 deletions(-)
>>
>> --
>> 2.34.1
>>