Message ID | 20130911090135.GB359@x4 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Am 11.09.2013 11:01, schrieb Markus Trippelsdorf: > On 2013.09.09 at 11:38 +0200, Christian König wrote: >> Am 09.09.2013 11:21, schrieb Markus Trippelsdorf: >>> On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote: >>>> Markus Trippelsdorf <markus@trippelsdorf.de> writes: >>>> >>>>> Here are a couple of patches that get kexec working with radeon devices. >>>>> I've tested this on my RS780. >>>>> Comments or flames are welcome. >>>>> Thanks. >>>> A couple of high level comments. >>>> >>>> This looks promising for the usual case. >>>> >>>> Removing the printk at the end of the kexec path seems a little dubious, >>>> what of other cpus, interrupt handlers, etc. Basically estabilishing a >>>> new rule on when printk is allowed seems a little dubious at this point, >>>> even if it is a useful debugging trick. >>> OK. I will drop this patch. It doesn't seem to be necessary, because I >>> cannot reproduce the printk related hang anymore. >>> >>>> Having a clean shutdown of the radeon definitely seems worth doing, >>>> because the cases where we care abouty video are when a person is in >>>> front of the system. >>> Yes. But please note that even with radeon_pci_shutdown implemented, I >>> still get ring test failures on roughly every eighth kexec boot: >>> >>> [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) >>> radeon 0000:01:05.0: disabling GPU acceleration >>> >>> That's definitely better than the current state of affairs, with ring >>> test failures on every second boot. But I haven't figured out the reason >>> for these failures yet. It's curious that once a ring test failure >>> occurs, it will reliably fail after each kexec invocation, no matter how >>> often repeated. Only a reboot brings the machine back to normal. >> The main problem here is that the AMD gfx hardware doesn't really >> support being reinitialized once booted (for various reasons). It's a >> (intended) limitation of the hardware design that you can only >> initialize certain blocks once every power cycle, so the whole approach >> actually will never work 100% reliable. >> >> All you can hope for is that stopping the hardware while shutting down >> the old kernel and starting it again results in exactly the same >> hardware parameters (offsets, clock etc...) otherwise starting the >> blocks will just fail and you end up with disabled acceleration like above. >> >> Sorry, but there isn't much we can do about this, > I've tested this further and it turned out that if I revert commit > f5d9b7f0f9 on top of my "drm/radeon: Implement radeon_pci_shutdown" > patch, the initialization failures seem to go away completely. > > Any idea what's going on? Well DPM is mostly Alex domain, but if I have to guess I would say that the SCLK is gated by the hardware when the driver is unloaded and since DPM initialized only later not ungated when the driver loads again. > Here's the patch: > > diff --git a/drivers/gpu/drm/radeon/r600_dpm.c b/drivers/gpu/drm/radeon/r600_dpm.c > index fa0de46..4e8c1988 100644 > --- a/drivers/gpu/drm/radeon/r600_dpm.c > +++ b/drivers/gpu/drm/radeon/r600_dpm.c > @@ -296,9 +296,9 @@ bool r600_dynamicpm_enabled(struct radeon_device *rdev) > void r600_enable_sclk_control(struct radeon_device *rdev, bool enable) > { > if (enable) > - WREG32_P(SCLK_PWRMGT_CNTL, 0, ~SCLK_PWRMGT_OFF); > + WREG32_P(GENERAL_PWRMGT, 0, ~SCLK_PWRMGT_OFF); > else > - WREG32_P(SCLK_PWRMGT_CNTL, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF); > + WREG32_P(GENERAL_PWRMGT, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF); > } > > void r600_enable_mclk_control(struct radeon_device *rdev, bool enable) The patch just breaks SCLK gating on R6xx again, so no gain here. Christian.
On Wed, Sep 11, 2013 at 5:01 AM, Markus Trippelsdorf <markus@trippelsdorf.de> wrote: > On 2013.09.09 at 11:38 +0200, Christian König wrote: >> Am 09.09.2013 11:21, schrieb Markus Trippelsdorf: >> > On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote: >> >> Markus Trippelsdorf <markus@trippelsdorf.de> writes: >> >> >> >>> Here are a couple of patches that get kexec working with radeon devices. >> >>> I've tested this on my RS780. >> >>> Comments or flames are welcome. >> >>> Thanks. >> >> A couple of high level comments. >> >> >> >> This looks promising for the usual case. >> >> >> >> Removing the printk at the end of the kexec path seems a little dubious, >> >> what of other cpus, interrupt handlers, etc. Basically estabilishing a >> >> new rule on when printk is allowed seems a little dubious at this point, >> >> even if it is a useful debugging trick. >> > OK. I will drop this patch. It doesn't seem to be necessary, because I >> > cannot reproduce the printk related hang anymore. >> > >> >> Having a clean shutdown of the radeon definitely seems worth doing, >> >> because the cases where we care abouty video are when a person is in >> >> front of the system. >> > Yes. But please note that even with radeon_pci_shutdown implemented, I >> > still get ring test failures on roughly every eighth kexec boot: >> > >> > [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) >> > radeon 0000:01:05.0: disabling GPU acceleration >> > >> > That's definitely better than the current state of affairs, with ring >> > test failures on every second boot. But I haven't figured out the reason >> > for these failures yet. It's curious that once a ring test failure >> > occurs, it will reliably fail after each kexec invocation, no matter how >> > often repeated. Only a reboot brings the machine back to normal. >> >> The main problem here is that the AMD gfx hardware doesn't really >> support being reinitialized once booted (for various reasons). It's a >> (intended) limitation of the hardware design that you can only >> initialize certain blocks once every power cycle, so the whole approach >> actually will never work 100% reliable. >> >> All you can hope for is that stopping the hardware while shutting down >> the old kernel and starting it again results in exactly the same >> hardware parameters (offsets, clock etc...) otherwise starting the >> blocks will just fail and you end up with disabled acceleration like above. >> >> Sorry, but there isn't much we can do about this, > > I've tested this further and it turned out that if I revert commit > f5d9b7f0f9 on top of my "drm/radeon: Implement radeon_pci_shutdown" > patch, the initialization failures seem to go away completely. > > Any idea what's going on? > You are disabling dynamic power management with that patch reverted. The patch fixed a copy paste typo in the register. Bit 0 (SCLK_PWRMGT_OFF) of register SCLK_PWRMGT_CNTL controls whether dynamic engine clock control is enabled. Bit 0 (GLOBAL_PWRMGT_EN) of register GENERAL_PWRMGT controls whether dynamic power management (dynamic engine/memory/voltage, controls etc.) is enabled at all. Alex > Here's the patch: > > diff --git a/drivers/gpu/drm/radeon/r600_dpm.c b/drivers/gpu/drm/radeon/r600_dpm.c > index fa0de46..4e8c1988 100644 > --- a/drivers/gpu/drm/radeon/r600_dpm.c > +++ b/drivers/gpu/drm/radeon/r600_dpm.c > @@ -296,9 +296,9 @@ bool r600_dynamicpm_enabled(struct radeon_device *rdev) > void r600_enable_sclk_control(struct radeon_device *rdev, bool enable) > { > if (enable) > - WREG32_P(SCLK_PWRMGT_CNTL, 0, ~SCLK_PWRMGT_OFF); > + WREG32_P(GENERAL_PWRMGT, 0, ~SCLK_PWRMGT_OFF); > else > - WREG32_P(SCLK_PWRMGT_CNTL, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF); > + WREG32_P(GENERAL_PWRMGT, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF); > } > > void r600_enable_mclk_control(struct radeon_device *rdev, bool enable) > > > -- > Markus > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
diff --git a/drivers/gpu/drm/radeon/r600_dpm.c b/drivers/gpu/drm/radeon/r600_dpm.c index fa0de46..4e8c1988 100644 --- a/drivers/gpu/drm/radeon/r600_dpm.c +++ b/drivers/gpu/drm/radeon/r600_dpm.c @@ -296,9 +296,9 @@ bool r600_dynamicpm_enabled(struct radeon_device *rdev) void r600_enable_sclk_control(struct radeon_device *rdev, bool enable) { if (enable) - WREG32_P(SCLK_PWRMGT_CNTL, 0, ~SCLK_PWRMGT_OFF); + WREG32_P(GENERAL_PWRMGT, 0, ~SCLK_PWRMGT_OFF); else - WREG32_P(SCLK_PWRMGT_CNTL, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF); + WREG32_P(GENERAL_PWRMGT, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF); } void r600_enable_mclk_control(struct radeon_device *rdev, bool enable)