Message ID | 1367485203-4533-1-git-send-email-imre.deak@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, May 02, 2013 at 12:00:03PM +0300, Imre Deak wrote: > Due to possible scheduling latencies wait_event_timeout doesn't > guarantee a non-zero return value, even if the condition becomes true > before the specified timeout expires. Thus we can incorrectly signal a > timeout and abort a DP AUX transaction. > > If wait_event_timeout returns 0, it's guaranteed that at least the > specified timeout (minus one jiffies, see below) had passed, so we can > fix this by checking the condition explicitly in this case. > > Also the timeout that wait_event_timeout() is guaranteed to wait if the > condition doesn't become true is one less jiffies than what is passed to > it as a parameter. This is because the absolute expiration time in > schedule_timeout() may be calculated at a moment close to the next > scheduling tick, when jiffies is incremented. So make sure we pass always > a jiffies value of 2 or greater. Here this makes a difference only for > HZ=100. > > This fixes DP AUX errors I saw during booting on an ILK. > > This should ideally be fixed in wait_event_timeout(), but that can take > a while. Until that's done use this fix as a band-aid. As we have 3 such vulnerable callsite in our driver alone, perhaps we should push for your general fix. -Chris
On Thu, May 2, 2013 at 11:17 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Thu, May 02, 2013 at 12:00:03PM +0300, Imre Deak wrote: >> Due to possible scheduling latencies wait_event_timeout doesn't >> guarantee a non-zero return value, even if the condition becomes true >> before the specified timeout expires. Thus we can incorrectly signal a >> timeout and abort a DP AUX transaction. >> >> If wait_event_timeout returns 0, it's guaranteed that at least the >> specified timeout (minus one jiffies, see below) had passed, so we can >> fix this by checking the condition explicitly in this case. >> >> Also the timeout that wait_event_timeout() is guaranteed to wait if the >> condition doesn't become true is one less jiffies than what is passed to >> it as a parameter. This is because the absolute expiration time in >> schedule_timeout() may be calculated at a moment close to the next >> scheduling tick, when jiffies is incremented. So make sure we pass always >> a jiffies value of 2 or greater. Here this makes a difference only for >> HZ=100. >> >> This fixes DP AUX errors I saw during booting on an ILK. >> >> This should ideally be fixed in wait_event_timeout(), but that can take >> a while. Until that's done use this fix as a band-aid. > > As we have 3 such vulnerable callsite in our driver alone, perhaps we > should push for your general fix. Yeah, I think this should be fixed in general, doesn't really make much sense given that the jiffies value is relative this way. Also, the actual timeout we need to wait is just 400 us, so I think we're pretty safe here. Can you please resumbit just the recheck part of your patch? Also, can you do the same fix for gmbus_wait_idle in intel_i2c.c please? Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Thu, May 2, 2013 at 11:31 AM, Daniel Vetter <daniel@ffwll.ch> wrote: > On Thu, May 2, 2013 at 11:17 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote: >> On Thu, May 02, 2013 at 12:00:03PM +0300, Imre Deak wrote: >>> Due to possible scheduling latencies wait_event_timeout doesn't >>> guarantee a non-zero return value, even if the condition becomes true >>> before the specified timeout expires. Thus we can incorrectly signal a >>> timeout and abort a DP AUX transaction. >>> >>> If wait_event_timeout returns 0, it's guaranteed that at least the >>> specified timeout (minus one jiffies, see below) had passed, so we can >>> fix this by checking the condition explicitly in this case. >>> >>> Also the timeout that wait_event_timeout() is guaranteed to wait if the >>> condition doesn't become true is one less jiffies than what is passed to >>> it as a parameter. This is because the absolute expiration time in >>> schedule_timeout() may be calculated at a moment close to the next >>> scheduling tick, when jiffies is incremented. So make sure we pass always >>> a jiffies value of 2 or greater. Here this makes a difference only for >>> HZ=100. >>> >>> This fixes DP AUX errors I saw during booting on an ILK. >>> >>> This should ideally be fixed in wait_event_timeout(), but that can take >>> a while. Until that's done use this fix as a band-aid. >> >> As we have 3 such vulnerable callsite in our driver alone, perhaps we >> should push for your general fix. > > Yeah, I think this should be fixed in general, doesn't really make > much sense given that the jiffies value is relative this way. Also, > the actual timeout we need to wait is just 400 us, so I think we're > pretty safe here. Can you please resumbit just the recheck part of > your patch? > > Also, can you do the same fix for gmbus_wait_idle in intel_i2c.c please? Oops, just now I've seen your generic patch. Yeah, I like that much better ;-) -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Thu, 2013-05-02 at 10:17 +0100, Chris Wilson wrote: > On Thu, May 02, 2013 at 12:00:03PM +0300, Imre Deak wrote: > > Due to possible scheduling latencies wait_event_timeout doesn't > > guarantee a non-zero return value, even if the condition becomes true > > before the specified timeout expires. Thus we can incorrectly signal a > > timeout and abort a DP AUX transaction. > > > > If wait_event_timeout returns 0, it's guaranteed that at least the > > specified timeout (minus one jiffies, see below) had passed, so we can > > fix this by checking the condition explicitly in this case. > > > > Also the timeout that wait_event_timeout() is guaranteed to wait if the > > condition doesn't become true is one less jiffies than what is passed to > > it as a parameter. This is because the absolute expiration time in > > schedule_timeout() may be calculated at a moment close to the next > > scheduling tick, when jiffies is incremented. So make sure we pass always > > a jiffies value of 2 or greater. Here this makes a difference only for > > HZ=100. > > > > This fixes DP AUX errors I saw during booting on an ILK. > > > > This should ideally be fixed in wait_event_timeout(), but that can take > > a while. Until that's done use this fix as a band-aid. > > As we have 3 such vulnerable callsite in our driver alone, perhaps we > should push for your general fix. Yes, I've sent a patch for wait_event_timeout() to start the discussion, I'm not sure how long it will take.. --Imre
On Thu, 2013-05-02 at 11:33 +0200, Daniel Vetter wrote: > On Thu, May 2, 2013 at 11:31 AM, Daniel Vetter <daniel@ffwll.ch> wrote: > > On Thu, May 2, 2013 at 11:17 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > >> On Thu, May 02, 2013 at 12:00:03PM +0300, Imre Deak wrote: > >>> Due to possible scheduling latencies wait_event_timeout doesn't > >>> guarantee a non-zero return value, even if the condition becomes true > >>> before the specified timeout expires. Thus we can incorrectly signal a > >>> timeout and abort a DP AUX transaction. > >>> > >>> If wait_event_timeout returns 0, it's guaranteed that at least the > >>> specified timeout (minus one jiffies, see below) had passed, so we can > >>> fix this by checking the condition explicitly in this case. > >>> > >>> Also the timeout that wait_event_timeout() is guaranteed to wait if the > >>> condition doesn't become true is one less jiffies than what is passed to > >>> it as a parameter. This is because the absolute expiration time in > >>> schedule_timeout() may be calculated at a moment close to the next > >>> scheduling tick, when jiffies is incremented. So make sure we pass always > >>> a jiffies value of 2 or greater. Here this makes a difference only for > >>> HZ=100. > >>> > >>> This fixes DP AUX errors I saw during booting on an ILK. > >>> > >>> This should ideally be fixed in wait_event_timeout(), but that can take > >>> a while. Until that's done use this fix as a band-aid. > >> > >> As we have 3 such vulnerable callsite in our driver alone, perhaps we > >> should push for your general fix. > > > > Yeah, I think this should be fixed in general, doesn't really make > > much sense given that the jiffies value is relative this way. Also, > > the actual timeout we need to wait is just 400 us, so I think we're > > pretty safe here. Can you please resumbit just the recheck part of > > your patch? Well, without the +1 to msecs_to_jiffies(10) part, I still can trigger the problem on HZ=100. Since w/o that we pass 1 jiffies and wait less than 400us.. > > > > Also, can you do the same fix for gmbus_wait_idle in intel_i2c.c please? Yes, can do this. --Imre > > Oops, just now I've seen your generic patch. Yeah, I like that much better ;-) > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index 8759fb1..7ce96c5 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -303,10 +303,10 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp, bool has_aux_irq) #define C (((status = I915_READ_NOTRACE(ch_ctl)) & DP_AUX_CH_CTL_SEND_BUSY) == 0) if (has_aux_irq) done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, - msecs_to_jiffies(10)); + msecs_to_jiffies(10) + 1); else done = wait_for_atomic(C, 10) == 0; - if (!done) + if (!done && !(C)) DRM_ERROR("dp aux hw did not signal timeout (has irq: %i)!\n", has_aux_irq); #undef C
Due to possible scheduling latencies wait_event_timeout doesn't guarantee a non-zero return value, even if the condition becomes true before the specified timeout expires. Thus we can incorrectly signal a timeout and abort a DP AUX transaction. If wait_event_timeout returns 0, it's guaranteed that at least the specified timeout (minus one jiffies, see below) had passed, so we can fix this by checking the condition explicitly in this case. Also the timeout that wait_event_timeout() is guaranteed to wait if the condition doesn't become true is one less jiffies than what is passed to it as a parameter. This is because the absolute expiration time in schedule_timeout() may be calculated at a moment close to the next scheduling tick, when jiffies is incremented. So make sure we pass always a jiffies value of 2 or greater. Here this makes a difference only for HZ=100. This fixes DP AUX errors I saw during booting on an ILK. This should ideally be fixed in wait_event_timeout(), but that can take a while. Until that's done use this fix as a band-aid. Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/intel_dp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)