Message ID | 20171107062617.4227-1-dhinakaran.pandiyan@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/11/17 07:26 AM, Dhinakaran Pandiyan wrote: > Some HW vblank counters reset due to power management events, which messes > up the vblank counting logic. This leads to screen freezes with user space > waiting on vblank events that may not occur if the counter keeps resetting. > > For e.g., After the HW vblank counter resets > [ 9.007359] [drm:drm_update_vblank_count [drm]] updating vblank count > on crtc 0: current=297, diff=4294965389, hw=5 hw_last=1912 > > So, fall back to the SW counter, computed using vblank timestamps > and frame duration, when the HW counter value deviates by 50% of the SW > computed value. > > I have tested this patch on my SKL laptop with i915.enable_psr=1 and it > *seems* to solve the screen freeze issue seen with PSR when DMC is loaded. > > Known issues: > 1) The 50% deviation margin is arbitrary. > 2) "Redundant vblirq ignored" messages are more frequent. > > I am sending this as an RFC to get feedback on whether the fall back > approach is sane and if it should be implemented in the core. Is there no way for the driver to know under which circumstances the reset to 0 might happen? If there is, maybe it could be solved by calling drm_crtc_vblank_off() before it might happen and drm_crtc_vblank_on() after it might have happened. Otherwise, might it be better not to use the HW counter at all when it's known not to be reliable?
On Tue, Nov 07, 2017 at 10:47:00AM +0100, Michel Dänzer wrote: > On 07/11/17 07:26 AM, Dhinakaran Pandiyan wrote: > > Some HW vblank counters reset due to power management events, which messes > > up the vblank counting logic. This leads to screen freezes with user space > > waiting on vblank events that may not occur if the counter keeps resetting. > > > > For e.g., After the HW vblank counter resets > > [ 9.007359] [drm:drm_update_vblank_count [drm]] updating vblank count > > on crtc 0: current=297, diff=4294965389, hw=5 hw_last=1912 > > > > So, fall back to the SW counter, computed using vblank timestamps > > and frame duration, when the HW counter value deviates by 50% of the SW > > computed value. > > > > I have tested this patch on my SKL laptop with i915.enable_psr=1 and it > > *seems* to solve the screen freeze issue seen with PSR when DMC is loaded. > > > > Known issues: > > 1) The 50% deviation margin is arbitrary. > > 2) "Redundant vblirq ignored" messages are more frequent. > > > > I am sending this as an RFC to get feedback on whether the fall back > > approach is sane and if it should be implemented in the core. > > Is there no way for the driver to know under which circumstances the > reset to 0 might happen? If there is, maybe it could be solved by > calling drm_crtc_vblank_off() before it might happen and > drm_crtc_vblank_on() after it might have happened. > > Otherwise, might it be better not to use the HW counter at all when it's > known not to be reliable? Yeah, I think we could just avoid using the HW counter whenever there's a possibility of PSR being used on the crtc. Assuming we still want to use the HW counter on crtcs that can't do PSR, we're going to need some kind of per-crtc mechanism to tell drm_vblank.c which method to use. hwmode.private_flags is one option. We already use it for choosing between the scanline counter and hardware timestamps when calculating the scanout position. But in this case the flag wouldn't be exactly private since drm_vblank.c would have to know about it as well. If that is deemed to be a problem, then we might just need to add a bool to struct drm_vblank_crtc to indicate that the hw counter is good, or maybe even move the dev->max_vblank_count to live inside struct drm_vblank_crtc.
On 07/11/17 11:50 AM, Ville Syrjälä wrote: > On Tue, Nov 07, 2017 at 10:47:00AM +0100, Michel Dänzer wrote: >> On 07/11/17 07:26 AM, Dhinakaran Pandiyan wrote: >>> Some HW vblank counters reset due to power management events, which messes >>> up the vblank counting logic. This leads to screen freezes with user space >>> waiting on vblank events that may not occur if the counter keeps resetting. >>> >>> For e.g., After the HW vblank counter resets >>> [ 9.007359] [drm:drm_update_vblank_count [drm]] updating vblank count >>> on crtc 0: current=297, diff=4294965389, hw=5 hw_last=1912 >>> >>> So, fall back to the SW counter, computed using vblank timestamps >>> and frame duration, when the HW counter value deviates by 50% of the SW >>> computed value. >>> >>> I have tested this patch on my SKL laptop with i915.enable_psr=1 and it >>> *seems* to solve the screen freeze issue seen with PSR when DMC is loaded. >>> >>> Known issues: >>> 1) The 50% deviation margin is arbitrary. >>> 2) "Redundant vblirq ignored" messages are more frequent. >>> >>> I am sending this as an RFC to get feedback on whether the fall back >>> approach is sane and if it should be implemented in the core. >> >> Is there no way for the driver to know under which circumstances the >> reset to 0 might happen? If there is, maybe it could be solved by >> calling drm_crtc_vblank_off() before it might happen and >> drm_crtc_vblank_on() after it might have happened. >> >> Otherwise, might it be better not to use the HW counter at all when it's >> known not to be reliable? > > Yeah, I think we could just avoid using the HW counter whenever there's > a possibility of PSR being used on the crtc. > > Assuming we still want to use the HW counter on crtcs that can't do PSR, > we're going to need some kind of per-crtc mechanism to tell drm_vblank.c > which method to use. hwmode.private_flags is one option. We already use > it for choosing between the scanline counter and hardware timestamps when > calculating the scanout position. But in this case the flag wouldn't be > exactly private since drm_vblank.c would have to know about it as well. > If that is deemed to be a problem, then we might just need to add a bool > to struct drm_vblank_crtc to indicate that the hw counter is good, or > maybe even move the dev->max_vblank_count to live inside > struct drm_vblank_crtc. Or just allow setting struct drm_vblank_crtc's get_vblank_counter member to NULL?
On Tue, Nov 07, 2017 at 11:55:44AM +0100, Michel Dänzer wrote: > On 07/11/17 11:50 AM, Ville Syrjälä wrote: > > On Tue, Nov 07, 2017 at 10:47:00AM +0100, Michel Dänzer wrote: > >> On 07/11/17 07:26 AM, Dhinakaran Pandiyan wrote: > >>> Some HW vblank counters reset due to power management events, which messes > >>> up the vblank counting logic. This leads to screen freezes with user space > >>> waiting on vblank events that may not occur if the counter keeps resetting. > >>> > >>> For e.g., After the HW vblank counter resets > >>> [ 9.007359] [drm:drm_update_vblank_count [drm]] updating vblank count > >>> on crtc 0: current=297, diff=4294965389, hw=5 hw_last=1912 > >>> > >>> So, fall back to the SW counter, computed using vblank timestamps > >>> and frame duration, when the HW counter value deviates by 50% of the SW > >>> computed value. > >>> > >>> I have tested this patch on my SKL laptop with i915.enable_psr=1 and it > >>> *seems* to solve the screen freeze issue seen with PSR when DMC is loaded. > >>> > >>> Known issues: > >>> 1) The 50% deviation margin is arbitrary. > >>> 2) "Redundant vblirq ignored" messages are more frequent. > >>> > >>> I am sending this as an RFC to get feedback on whether the fall back > >>> approach is sane and if it should be implemented in the core. > >> > >> Is there no way for the driver to know under which circumstances the > >> reset to 0 might happen? If there is, maybe it could be solved by > >> calling drm_crtc_vblank_off() before it might happen and > >> drm_crtc_vblank_on() after it might have happened. > >> > >> Otherwise, might it be better not to use the HW counter at all when it's > >> known not to be reliable? > > > > Yeah, I think we could just avoid using the HW counter whenever there's > > a possibility of PSR being used on the crtc. > > > > Assuming we still want to use the HW counter on crtcs that can't do PSR, > > we're going to need some kind of per-crtc mechanism to tell drm_vblank.c > > which method to use. hwmode.private_flags is one option. We already use > > it for choosing between the scanline counter and hardware timestamps when > > calculating the scanout position. But in this case the flag wouldn't be > > exactly private since drm_vblank.c would have to know about it as well. > > If that is deemed to be a problem, then we might just need to add a bool > > to struct drm_vblank_crtc to indicate that the hw counter is good, or > > maybe even move the dev->max_vblank_count to live inside > > struct drm_vblank_crtc. > > Or just allow setting struct drm_vblank_crtc's get_vblank_counter member > to NULL? Looks like that's actually under drm_crtc_funcs. I'm thinking we want to keep that as const. Not sure we want to add a third way for drivers to provide a .get_vblank_counter() hook (the second one being the old drm_driver.get_vblank_counter() hook, which i915 is still using actually).
On Tue, Nov 07, 2017 at 10:47:00AM +0100, Michel Dänzer wrote: > On 07/11/17 07:26 AM, Dhinakaran Pandiyan wrote: > > Some HW vblank counters reset due to power management events, which messes > > up the vblank counting logic. This leads to screen freezes with user space > > waiting on vblank events that may not occur if the counter keeps resetting. > > > > For e.g., After the HW vblank counter resets > > [ 9.007359] [drm:drm_update_vblank_count [drm]] updating vblank count > > on crtc 0: current=297, diff=4294965389, hw=5 hw_last=1912 > > > > So, fall back to the SW counter, computed using vblank timestamps > > and frame duration, when the HW counter value deviates by 50% of the SW > > computed value. > > > > I have tested this patch on my SKL laptop with i915.enable_psr=1 and it > > *seems* to solve the screen freeze issue seen with PSR when DMC is loaded. > > > > Known issues: > > 1) The 50% deviation margin is arbitrary. > > 2) "Redundant vblirq ignored" messages are more frequent. > > > > I am sending this as an RFC to get feedback on whether the fall back > > approach is sane and if it should be implemented in the core. > > Is there no way for the driver to know under which circumstances the > reset to 0 might happen? If there is, maybe it could be solved by > calling drm_crtc_vblank_off() before it might happen and > drm_crtc_vblank_on() after it might have happened. > > Otherwise, might it be better not to use the HW counter at all when it's > known not to be reliable? We know when it happens, so agreed this isn't a good/workable solution really. I thought the plan to fix that was to fix up our runtime pm to make sure the vblank counter doesn't get reset while we need it (pending flip or vblank). And in-between (when the vblank counter is totally off) we'd fix any mismatch by adjusting the sw vblank counter with an explicit call (where we can use the elapsed time to estimate the elapsed vblank counts well enough). Adding a magic hack like this doesn't sound like a good plan to me indeed. -Daniel
On Tue, 2017-11-07 at 13:39 +0100, Daniel Vetter wrote: > On Tue, Nov 07, 2017 at 10:47:00AM +0100, Michel Dänzer wrote: > > On 07/11/17 07:26 AM, Dhinakaran Pandiyan wrote: > > > Some HW vblank counters reset due to power management events, which messes > > > up the vblank counting logic. This leads to screen freezes with user space > > > waiting on vblank events that may not occur if the counter keeps resetting. > > > > > > For e.g., After the HW vblank counter resets > > > [ 9.007359] [drm:drm_update_vblank_count [drm]] updating vblank count > > > on crtc 0: current=297, diff=4294965389, hw=5 hw_last=1912 > > > > > > So, fall back to the SW counter, computed using vblank timestamps > > > and frame duration, when the HW counter value deviates by 50% of the SW > > > computed value. > > > > > > I have tested this patch on my SKL laptop with i915.enable_psr=1 and it > > > *seems* to solve the screen freeze issue seen with PSR when DMC is loaded. > > > > > > Known issues: > > > 1) The 50% deviation margin is arbitrary. > > > 2) "Redundant vblirq ignored" messages are more frequent. > > > > > > I am sending this as an RFC to get feedback on whether the fall back > > > approach is sane and if it should be implemented in the core. > > > > Is there no way for the driver to know under which circumstances the > > reset to 0 might happen? Not precisely, but the driver does need to allow the firmware to go into lower power states. > If there is, maybe it could be solved by > > calling drm_crtc_vblank_off() before it might happen and > > drm_crtc_vblank_on() after it might have happened. > > > > Otherwise, might it be better not to use the HW counter at all when it's > > known not to be reliable? > > We know when it happens, so agreed this isn't a good/workable solution > really. I thought the plan to fix that was to fix up our runtime pm to > make sure the vblank counter doesn't get reset while we need it (pending > flip or vblank). And in-between (when the vblank counter is totally off) > we'd fix any mismatch by adjusting the sw vblank counter with an explicit > call (where we can use the elapsed time to estimate the elapsed vblank > counts well enough). Adding a magic hack like this doesn't sound like a > good plan to me indeed. > -Daniel Thanks for the feedback, I wasn't aware of an agreed plan. I just went through patchwork history now and realized that the full-time SW counter approach was tried earlier, but not much information on why it was abandoned. Allowing runtime PM (PSR + DMC) to do it's job while letting the SW estimate vblanks if the hardware counter is unreliable didn't seem like a terrible plan. It could save more power fwiw. But, I understand your concerns, will see what I can do. -DK
diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c index 57cc6e37c810..8000aae5f1f7 100644 --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -190,11 +190,12 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, bool in_vblank_irq) { struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; - u32 cur_vblank, diff; + u32 cur_vblank; bool rc; ktime_t t_vblank; int count = DRM_TIMESTAMP_MAXRETRIES; int framedur_ns = vblank->framedur_ns; + u32 diff = in_vblank_irq ? 1 : 0; /* * Interrupts were disabled prior to this call, so deal with counter @@ -213,26 +214,31 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, in_vblank_irq); } while (cur_vblank != __get_vblank_counter(dev, pipe) && --count > 0); - if (dev->max_vblank_count != 0) { - /* trust the hw counter when it's around */ + if (dev->max_vblank_count) diff = (cur_vblank - vblank->last) & dev->max_vblank_count; - } else if (rc && framedur_ns) { + + if (rc && framedur_ns) { u64 diff_ns = ktime_to_ns(ktime_sub(t_vblank, vblank->time)); + u32 sw_diff; /* * Figure out how many vblanks we've missed based * on the difference in the timestamps and the * frame/field duration. */ - diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); - - if (diff == 0 && in_vblank_irq) + sw_diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); + if (sw_diff == 0 && in_vblank_irq) DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." " diff_ns = %lld, framedur_ns = %d)\n", pipe, (long long) diff_ns, framedur_ns); - } else { - /* some kind of default for drivers w/o accurate vbl timestamping */ - diff = in_vblank_irq ? 1 : 0; + + if (!dev->max_vblank_count) + diff = sw_diff; + else if (sw_diff && abs(diff - sw_diff) > DIV_ROUND_CLOSEST(sw_diff, 2)) { + DRM_DEBUG_VBL("hw vblank counter(%u) deviates from sw (%u)\n", + diff, sw_diff); + diff = sw_diff; + } } /*
Some HW vblank counters reset due to power management events, which messes up the vblank counting logic. This leads to screen freezes with user space waiting on vblank events that may not occur if the counter keeps resetting. For e.g., After the HW vblank counter resets [ 9.007359] [drm:drm_update_vblank_count [drm]] updating vblank count on crtc 0: current=297, diff=4294965389, hw=5 hw_last=1912 So, fall back to the SW counter, computed using vblank timestamps and frame duration, when the HW counter value deviates by 50% of the SW computed value. I have tested this patch on my SKL laptop with i915.enable_psr=1 and it *seems* to solve the screen freeze issue seen with PSR when DMC is loaded. Known issues: 1) The 50% deviation margin is arbitrary. 2) "Redundant vblirq ignored" messages are more frequent. I am sending this as an RFC to get feedback on whether the fall back approach is sane and if it should be implemented in the core. Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> --- drivers/gpu/drm/drm_vblank.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-)