Message ID | 1599072130-10043-4-git-send-email-andrey.grodzovsky@amd.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | Implement PCI Error Recovery on Navi12 | expand |
On Wed, Sep 02, 2020 at 02:42:05PM -0400, Andrey Grodzovsky wrote: > Wait for HW/PSP initiated ASIC reset to complete before > starting the recovery operations. > > v2: Remove typo > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > Reviewed-by: Alex Deucher <alexander.deucher@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 ++++++++++++++++++++-- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index e999f1f..412d07e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -4838,14 +4838,32 @@ pci_ers_result_t amdgpu_pci_slot_reset(struct pci_dev *pdev) > { > struct drm_device *dev = pci_get_drvdata(pdev); > struct amdgpu_device *adev = drm_to_adev(dev); > - int r; > + int r, i; > bool vram_lost; > + u32 memsize; > > DRM_INFO("PCI error: slot reset callback!!\n"); > > + /* wait for asic to come out of reset */ I know it's totally OCD, but it is a minor speed bump to read "asic" here and "ASIC" in the commit log above and the new comment below. > + msleep(500); > + > pci_restore_state(pdev); > > - adev->in_pci_err_recovery = true; > + /* confirm ASIC came out of reset */ > + for (i = 0; i < adev->usec_timeout; i++) { > + memsize = amdgpu_asic_get_config_memsize(adev); > + > + if (memsize != 0xffffffff) I guess this is a spot where you actually depend on an MMIO read returning 0xffffffff because adev->in_pci_err_recovery is false at this point, so amdgpu_mm_rreg() will actually *do* the MMIO read instead of returning 0. Right? > + break; > + udelay(1); > + } > + if (memsize == 0xffffffff) { > + r = -ETIME; > + goto out; > + } > + > + /* TODO Call amdgpu_pre_asic_reset instead */ > + adev->in_pci_err_recovery = true; > r = amdgpu_device_ip_suspend(adev); > adev->in_pci_err_recovery = false; > if (r) > -- > 2.7.4 >
On 9/2/20 6:05 PM, Bjorn Helgaas wrote: > On Wed, Sep 02, 2020 at 02:42:05PM -0400, Andrey Grodzovsky wrote: >> Wait for HW/PSP initiated ASIC reset to complete before >> starting the recovery operations. >> >> v2: Remove typo >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 ++++++++++++++++++++-- >> 1 file changed, 20 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index e999f1f..412d07e 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -4838,14 +4838,32 @@ pci_ers_result_t amdgpu_pci_slot_reset(struct pci_dev *pdev) >> { >> struct drm_device *dev = pci_get_drvdata(pdev); >> struct amdgpu_device *adev = drm_to_adev(dev); >> - int r; >> + int r, i; >> bool vram_lost; >> + u32 memsize; >> >> DRM_INFO("PCI error: slot reset callback!!\n"); >> >> + /* wait for asic to come out of reset */ > I know it's totally OCD, but it is a minor speed bump to read "asic" > here and "ASIC" in the commit log above and the new comment below. > >> + msleep(500); >> + >> pci_restore_state(pdev); >> >> - adev->in_pci_err_recovery = true; >> + /* confirm ASIC came out of reset */ >> + for (i = 0; i < adev->usec_timeout; i++) { >> + memsize = amdgpu_asic_get_config_memsize(adev); >> + >> + if (memsize != 0xffffffff) > I guess this is a spot where you actually depend on an MMIO read > returning 0xffffffff because adev->in_pci_err_recovery is false at > this point, so amdgpu_mm_rreg() will actually *do* the MMIO read > instead of returning 0. Right? Yes, i picked this form another place where they use it to confirm PCI confspace and the BARs specifically are restored and active and so you can read back sane MMIO regs values. Andrey > >> + break; >> + udelay(1); >> + } >> + if (memsize == 0xffffffff) { >> + r = -ETIME; >> + goto out; >> + } >> + >> + /* TODO Call amdgpu_pre_asic_reset instead */ >> + adev->in_pci_err_recovery = true; >> r = amdgpu_device_ip_suspend(adev); >> adev->in_pci_err_recovery = false; >> if (r) >> -- >> 2.7.4 >>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e999f1f..412d07e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4838,14 +4838,32 @@ pci_ers_result_t amdgpu_pci_slot_reset(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev); struct amdgpu_device *adev = drm_to_adev(dev); - int r; + int r, i; bool vram_lost; + u32 memsize; DRM_INFO("PCI error: slot reset callback!!\n"); + /* wait for asic to come out of reset */ + msleep(500); + pci_restore_state(pdev); - adev->in_pci_err_recovery = true; + /* confirm ASIC came out of reset */ + for (i = 0; i < adev->usec_timeout; i++) { + memsize = amdgpu_asic_get_config_memsize(adev); + + if (memsize != 0xffffffff) + break; + udelay(1); + } + if (memsize == 0xffffffff) { + r = -ETIME; + goto out; + } + + /* TODO Call amdgpu_pre_asic_reset instead */ + adev->in_pci_err_recovery = true; r = amdgpu_device_ip_suspend(adev); adev->in_pci_err_recovery = false; if (r)